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Abstract 

The  Bayesian  solution  for  tracking  a  target  in  clutter  results  naturally  in  a 
target  state  Gaussian  mixture  probability  density  function  (pdf)  which  is  a  sum  of 
weighted  Gaussian  pdfs,  or  mixture  components.  As  new  tracking  measurements 
are  received,  the  number  of  mixture  components  increases  without  bound,  and  even¬ 
tually  a  reduced-component  approximation  of  the  original  Gaussian  mixture  pdf  is 
necessary  to  evaluate  the  target  state  pdf  efficiently  while  maintaining  good  tracking 
performance.  Many  approximation  methods  exist,  but  these  methods  are  either  ad 
hoc  or  use  rather  crude  approximation  techniques.  Recent  studies  have  shown  that 
a  measure-function-based  mixture  reduction  algorithm  (MRA)  may  be  used  to  gen¬ 
erate  a  high-quality  reduced-component  approximation  to  the  original  target  state 
Gaussian  mixture  pdf. 

To  date,  the  Integral  Square  Error  (ISE)  cost-function-based  MRA  has  been 
shown  to  provide  better  tracking  performance  than  any  previously  published  Bayesian 
tracking  in  heavy  clutter  algorithm.  Research  conducted  for  this  thesis  has  led  to  the 
development  of  a  new  measure  function,  the  Correlation  Measure  (CM),  which  gauges 
the  similarity  between  a  full-  and  reduced-component  Gaussian  mixture  pdf.  This 
new  measure  function  is  implemented  in  an  MRA  and  tested  in  a  simulated  scenario 
of  a  single  target  in  heavy  clutter.  Results  indicate  that  the  CM  MRA  provides 
slightly  better  performance  than  the  ISE  cost-function-based  MRA,  but  only  by  a 
small  margin. 
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Notation 


Notation 

Usage 

x[k) 

the  discrete-time  state  random  process  vector  at  sample  /c; 
represents  the  state  (location,  velocity,  etc.)  of  the  target 

x(k\k  —  1) 

the  mean  estimate  of  the  state  random  process  vector  at  sam¬ 
ple  k,  using  information  only  up  to  the  (k  —  l)th  measurement 

P{k\k) 

the  covariance  estimate  of  the  state  random  process  vector  at 
sample  k ,  using  all  available  information  up  to  the  kth  mea¬ 
surement 

X(k) 

the  joint  target  state  random  process  composite  vector  con¬ 
taining  the  state  random  process  vectors  of  multiple  targets 
at  sample  k 

z(k) 

Z(k) 

the  measurement  random  process  vector  at  sample  k 

the  composite  measurement  random  process  vector  containing 

all  of  the  measurement  random  process  vectors  at  sample  k 

Zfc 

Z  k 

Zk 

jk 

a  realization  of  z{k)  at  sample  k 

a  realization  of  Z(k)  at  sample  k 

the  measurement  history  through  sample  k 

the  realized  measurement  history  through  sample  k 

P(') 

the  probability  mass  function  (pmf)  for  the  discrete  random 

/(•) 

argument  (•) 

the  probability  density  function  (pdf)  for  the  continuous  ran¬ 
dom  argument  (•) 

L{{zi}  ?;a) 

the  likelihood  function  for  the  set  of  observations  {zi},  i  = 

1 , ,n,  from  the  distribution  of  z  with  pdf  scalar  parameter 
a  unknown 

N{x\  /i,  a2} 

denotes  a  Gaussian  pdf  for  the  scalar  random  variable  x,  dis¬ 
tributed  with  mean  /i  and  variance  a2 

N{x]fi,P} 

denotes  a  Gaussian  pdf  for  the  vector  random  variable  x,  dis¬ 
tributed  with  mean  [i  and  covariance  P 

n0,  n 

the  set  of  multivariate  Gaussian  mixture  parameters  for  the 
original  and  reduced-component  target  state  pdfs,  respectively 

a0,  a 

the  true  and  estimated  pdf  scalar  parameters,  respectively 

Nf 

the  number  of  models  used  in  a  multiple  model  algorithm,  and 
thus  the  number  of  elemental  filters 

M 

continuous  random  vector  representing  kinematic  model  un¬ 
certainty  for  non-switching  models 

M(k) 

continuous  random  process  vector  representing  kinematic 
model  uncertainty  for  switching  models 

M.t 

discrete  random  vector  representing  kinematic  model  uncer¬ 
tainty  for  non-switching  models 

xm 


Notation 

Usage 

Mlfc 

discrete  random  process  vector  representing  kinematic  model 
uncertainty  for  switching  models 

M 

the  continuous  sample  space  of  M  and  M(/c);  M,  M(/c)  €  M  C 

Mn;  or,  the  discrete  sample  space  of  M*  and  ;  M,  e  M  = 

and  Mit6M=  {M*}^ 

the  history  of  models  for  switching  models  from  sample  £  =  1 
through  sample  £  =  k  —  1 

/ii(fc),  pik(k) 

mode  probability  that  model  i  for  non-switching  models,  or 
model  '4  for  switching  models,  is  correct  given  the  measure¬ 
ment  history 

l^k-l 

mode  Markov  transition  probability  (from  model  Mjt  *  to 
model  Mjfc)  for  switching  models  under  the  assumption  that 
Mjfc  is  a  discrete  Markov  random  process 

the  association  event  continuous  random  process  vector  rep¬ 
resenting  the  uncertainty  in  the  origin  of  measurements 

©4 

the  association  event  discrete  random  process  vector  repre- 

{©Jl"1 

senting  the  uncertainty  in  the  origin  of  measurements 

the  history  of  association  events  from  sample  £  =  1  through 

sample  £  =  k  —  1 

Nm(k ) 

NDT,ik 

the  number  of  measurements  at  sample  k 
the  number  of  measurements  hypothesized  under  association 
event  Qjk  as  originating  from  targets  hypothesized  in  a  previ¬ 
ous  scan  and  detected  in  the  current  scan  k 

NTgt 

the  total  number  of  existing  targets  hypothesized  under  the 
association  event  history  through  sample  k  —  1 

N NT,ik 

the  number  of  measurements  hypothesized  under  association 
event  as  originating  from  potential  new  targets  at  the 

current  scan  k 

N FT,ik 

the  number  of  measurements  hypothesized  under  association 
event  Qik  as  originating  from  false  sources  at  the  current  scan 
k 

NH(k) 

the  original  number  of  hypothesized  mixture  components  in 
the  target  state  multivariate  Gaussian  mixture  pdf  at  sample 
k  before  mixture  reduction  is  applied 

NR{k) 

the  reduced  number  of  mixture  components  in  the  target  state 
multivariate  Gaussian  mixture  pdf  at  sample  k  after  mixture 
reduction  is  applied 

T  {pdf  u  pdf 2} 

D  {pdf  1,  pdf 2} 
A  =  det  A 
(f(x),g(x)) 

a  true  distance  measure  between  two  pdfs 
a  pseudo-distance  measure  between  two  pdfs 
the  determinant  of  the  matrix  A 
the  inner  product  of  two  functions  of  x  defined  as 

Lexf(x)g(x)dx 
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GAUSSIAN  MIXTURE  REDUCTION  FOR 
BAYESIAN  TARGET  TRACKING  IN  CLUTTER 


I.  Introduction 


Measurement  origin  and  target  kinematics  model  parameter  uncertainties  are 
two  fundamental  problems  encountered  when  estimating  the  state  (position, 
velocity,  etc.)  of  targets  in  clutter.  By  modeling  these  sources  of  uncertainty  as 
random  quantities,  Bayes  estimation  may  be  used  to  estimate  the  unknown  target 
state.  If  the  uncertainty  is  modeled  as  a  discrete  random  vector  (i.e. ,  unknown  but 
constant  over  time),  then  the  resulting  Bayesian  solution  is  computationally  tractable. 
However,  if  the  uncertainty  is  modeled  as  a  discrete  random  process  vector  (i.e., 
unknown  and  generally  changing  over  time),  then  the  rigorous  Bayesian  solution  is  a 
summation  of  weighted  Gaussian  probability  density  functions  in  which  the  number  of 
terms  in  the  summation  increases  exponentially  with  time  (assuming  linear  dynamics 
and  measurement  models,  and  normally  distributed  noise  disturbances  and  initial 
state)  pfjJ2 11138].  Thus,  this  solution  is  computationally  intractable,  and  approximation 
is  necessary  to  implement  the  Bayesian  solution. 


Two  types  of  Gaussian  mixture  approximations  to  the  rigorous  Bayesian  solu¬ 
tion  can  be  used  for  reducing  the  number  of  mixture  components  (summation  terms 
in  the  Gaussian  mixture  (pdf))  when  the  measurement  origin  uncertainty  is  modeled 
as  a  discrete  random  process  vector.  The  first  type  of  approximation  is  to  reduce  the 
Gaussian  mixture  pdf  to  a  single  Gaussian  pdf  at  the  end  of  each  tracking  system 
scan.  This  approximation  is  used  in  the  Probabilistic  Data  Association  Filter  (PDAF) 
and  the  Joint  Probabilistic  Data  Association  Filter  (JPDAF)  j5].  Although  this  ap¬ 
proximation  is  relatively  simple  and  it  does  not  require  many  computations,  it  is  a 
rather  crude  approximation  in  cases  in  which  the  target  state  pdf  is  multi-modal  with 


well-spaced  peaks  [31]  •  The  other  type  of  approximation  is  to  reduce  the  number  of 
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mixture  components  in  the  target  state  pdf  at  the  end  of  each  scan,  as  by  using  some 
type  of  measure  function  (providing  a  measure  of  the  difference  between  the  original 
pdf  and  the  pdf  with  the  reduced  number  of  mixture  components)  to  guide  reduction 
decisions.  This  kind  of  approximation  typically  provides  a  better  reduced-component 
representation  of  the  original  Gaussian  mixture  target  state  pdf,  but  at  the  expense 
of  increased  computational  complexity  and  time. 


Various  examples  of  reducing  the  number  of  mixture  components  are  found  in 
the  literature.  Singer  et  al.  [32]  proposed  merging  measurement  histories  after  (N  + 1) 
scans  and/or  limiting  the  pool  of  measurements  to  include  in  the  measurement  as¬ 
sociation  hypothesis  generation  process  to  decrease  the  number  of  mixture  compo¬ 


nents.  Reid  [27]  manages  the  explosive  growth  of  mixture  components  by  merging 
similar  ones  and  deleting  unlikely  ones.  Alspach  p],  Lainiotis  and  Park  [18],  and 


Salmond  [30]  appear  to  be  the  first  researchers  to  attack  this  problem  by  reduc¬ 
ing  the  number  of  mixture  components  based  on  a  measure  function.  The  latter  of 
this  group  proposed  optimally  reducing  the  number  of  components  by  minimizing 
a  cost  function  based  on  the  covariance  matrix  of  the  Gaussian  mixture.  Fourteen 
years  later,  Williams  [38]n  also  proposed  using  a  cost  function  criterion  for  mixture 


reduction.  However,  Williams’  cost  function  differed  from  that  of  Salmond  by  con¬ 
sidering  the  impact  of  reduction  actions  on  the  entire  Gaussian  mixture  pdf,  and  not 
just  the  covariance  matrix.  Compared  to  other  possible  cost  functions  (such  as  the 
Kolmogorov  Variational  Distance  p[][2JI2T][2SlSB],  Bhattacharyya  coefficient  [18],  and 
Kullback-Leibler  distances  [17]),  Williams’  cost  function  can  be  evaluated  without 
approximation,  yielding  tractability.  In  fact,  Williams’  Integral  Square  Error  (ISE) 


1This  author  owes  a  great  deal  of  gratitude  to  Williams  since  the  majority  of  this  thesis  is  based 
on  his  work. 
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mixture  reduction  algorithm  (MRA)  produced  what  is  thought  to  be  the  best  single 
target  tracking  performance  in  a  heavy  clutter  environment  [381BDIBT]. 

1.1  Research  Goal 

Recommendations  in  [38]  indicate  the  potential  for  improving  upon  the  results 
produced  by  Williams’  ISE  MRA.  The  goal  of  this  research  is  to  create  a  new  MRA 
which  offers  better  tracking  performance  and/or  decreased  computation  time  as  com¬ 
pared  to  Williams’  algorithm.  Research  conducted  for  this  thesis  will  be  tailored  to 
meet  this  goal.  Specifically,  the  fields  of  mathematical  statistics  and  statistical  infer¬ 
ence  will  be  explored  to  cultivate  ideas  which  may  be  useful  for  developing  the  new 
MRA. 

1 . 2  Organization 

With  the  above  goal  in  mind,  the  remainder  of  this  thesis  is  organized  into 
six  additional  chapters.  Chapters  HI  [HU  and  [TV]  provide  the  background  concepts 
needed  for  this  thesis.  The  Bayesian  approach  to  target  tracking  is  covered  in  Chap¬ 
ter  If,  which  introduces  the  concept  of  target  tracking,  target  tracking  models,  target 
state  pdf  estimation,  Gaussian  mixture  pdfs,  and  target  state  pdf  estimation  in  the 
presence  of  kinematics  model  parameter  uncertainty  and  measurement  origin  uncer¬ 
tainty.  Chapter  |TTT|  is  motivated  by  the  recommendation  in  [38]  to  utilize  a  maximum 
likelihood  estimation  measure  function.  It  summarizes  the  maximum  likelihood  es¬ 
timation  and  Expectation  Maximization  techniques  for  estimating  pdfs,  and  it  also 
develops  a  maximum  likelihood  estimation-based  measure  function.  The  final  back¬ 
ground  chapter,  Chapter  EJ  presents  distance  and  pseudo-distance  measure  functions 
for  approximating  pdfs  as  well  as  the  MRAs  developed  by  Williams  [38]  [40,31]  and 
S almond  [301  [31]. 

Based  on  the  material  covered  in  the  background  chapters,  four  new  MRAs  are 
developed,  implemented,  tested,  and  analyzed  in  Chapter  [51  The  best  performing 
MRA  is  chosen  and  tested  in  a  single-target  in  heavy  clutter  tracking  simulation 
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scenario  presented  in  Chapter  EH  Finally,  Chapter  I VIII  concludes  this  thesis  by 
summarizing  the  results  of  this  research  and  recommending  ideas  for  future  study. 
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II.  The  Bayesian  Approach  to  Target  Tracking 


Target  tracking  is  a  means  of  determining  the  state  of  moving  objects  over  some 
time  interval  of  interest  from  observations  of  the  objects  in  the  presence  of 
uncertainty.  A  state  may  be  one  or  several  random  processes  that  completely  describes 
the  behavior  of  the  moving  objects  at  any  point  in  the  time  interval  of  interest. 
The  movement  of  the  objects  is  described  by  one  or  more  dynamics  models  that 
mathematically  characterize  the  motion  of  the  objects.  Observations  of  the  objects 
are  made  using  sensors.  Uncertainty  is  present  in  both  the  objects’  movement  and  in 
observations  of  the  objects.  Both  sensor  measurement  model  adequacy  concerns  and 
actual  measurement  corruption  noise  contribute  to  this  uncertainty. 

One  common  approach  to  target  tracking  is  Bayesian  estimation.  This  estima¬ 
tion  method  is  used  to  obtain,  in  most  instances,  a  real-time,  recursive  solution  for 
a  target  tracking  problem,  which  makes  Bayesian  estimation  an  ideal  tool  for  tar¬ 
get  tracking.  The  goal  of  this  chapter  is  to  present  the  Bayesian  approach  to  target 
tracking  and  highlight  its  versatility  in  solving  problems  which  exhibit  uncertainty  in 
the  target  state,  the  dynamics  model  parameters,  and  the  origin  of  measurements. 
Bayesian  estimation  uses  Bayes’  rule  to  solve  for  the  target  state  pdf,  and  it  will 
be  shown  that  the  Bayesian  approach  to  target  tracking  may  be  boiled  down  to  one 
equation.  This  insight  has  been  mentioned  in  other  sources  (4|[5J[38],  and  Bayes’  rule 
is  emphasized  as  the  starting  point  for  solving  every  problem  in  this  chapter. 

Target  tracking  scenarios  can  be  roughly  categorized  by  the  number  of  targets, 
the  number  of  sensors,  and  the  number  of  dynamics  models.  Scenarios  which  allow 
multiple  targets  are  more  prevalent  in  practice,  whereas  single  target  tracking  scenar¬ 
ios  are  more  often  used  as  an  instructional  tool  because  of  their  relative  simplicity. 
When  available,  multiple  sensors  usually  provide  more  information  about  the  state 
of  the  targets  because  the  sensors’  observations  may  be  optimally  combined  using  a 
multi-sensor  fusion  algorithm.  In  contrast,  a  single-sensor  tracking  system  typically 
provides  less  information  than  a  multi-sensor  system  using  sensors  with  the  same  ac¬ 
curacy,  but  the  tracking  algorithms  become  less  complicated  than  in  a  multi-sensor 
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system.  In  some  cases,  tracking  system  designers  do  not  know  the  correct  dynamics 
model  to  incorporate  in  their  system,  so  multiple  possible  models  are  included  in  the 
design  to  account  for  this  source  of  uncertainty.  Consequently,  the  design  architec¬ 
ture  requires  some  type  of  multiple  model  estimation  algorithm.  In  other  instances, 
though,  one  dynamics  model  may  be  enough  to  handle  a  given  tracking  scenario. 

Figure  12.11  depicts  a  block  diagram  of  the  target  tracking  process  in  which  a 
presumed  number  of  targets  exist  in  the  observation  environment.  At  each  scan, 
the  sensors  (or  single  sensor)  obtain  noise-corrupted  measurements  which  are  math¬ 
ematically  related  to  the  state  through  the  measurement  model.  The  origin  of  each 
measurement  is  not  known,  and  each  measurement  could  belong  to  any  one  of  the 
targets  or  it  could  have  resulted  from  an  erroneous  detection  due  to  clutter  or  sensor 
error  (as  characterized  by  its  false  alarm  rate).  Therefore,  a  tracking  system  may  need 
to  perform  a  measurement  association  process  to  reconcile  the  origin  of  each  measure¬ 
ment  to  remove  some  uncertainty  in  the  overall  tracking  problem.  The  measurements 
are  fed  into  a  bank  of  estimators  (or  one  estimator  if  only  one  model  is  used),  each 
with  its  own  presumed  dynamics  model  and  hypothesized  measurement  association 
set,  and  a  final  state  estimate  is  made  by  appropriately  combining  the  outputs  of  the 
separate  state  estimators. 

This  chapter  is  organized  as  follows.  Target  kinematics  models  are  presented  in 
Section  [2TT1  Section  [2T21  introduces  recursive  Bayesian  filtering  for  linear  and  nonlinear 
estimation  of  the  target  state  pdf.  A  description  of  the  multivariate  Gaussian  mixture 
pdf  is  provided  in  Section  12.3]  to  lay  the  groundwork  for  understanding  the  Bayesian 
solution  when  uncertainty  exists  in  the  kinematics  model  parameters,  Section  I2.4f 
or  in  the  source  of  the  measurements,  Section  12.51  Understanding  Gaussian  mixture 
pdfs  will  also  be  useful  for  Chapter  Hill  Estimating  Probability  Density  Functions, 
and  Chapter  lIVl  Approximating  Gaussian  Mixtures  and  Mixture  Reduction. 
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Figure  2.1: 


A  conceptual  block  diagram  of  a  target  tracking  algorithm. 
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2.1  Target  Kinematics  Models 


Adequate  target  kinematics  models  are  essential  for  accurate  target  tracking. 
Without  sufficient  models,  even  the  most  advanced  model-based  target  tracking  algo¬ 
rithm  falls  apart.  Kinematics  models,  or  dynamics  models,  are  based  on  differential 
equations  which  characterize  the  evolution  of  the  objects’  movements  over  time.  By 
its  nature,  a  differential  equation  is  only  an  approximation  of  the  actual  target  dy¬ 
namics.  Since  the  mathematical  descriptions  are  approximations,  there  is  uncertainty 
in  the  fidelity  of  the  model  and  the  differential  equations  become  stochastic  in  na¬ 


ture  [2T].  If  the  stochastic  differential  equations  are  restricted  to  the  time-invariant 
class  of  problems,  then  these  models  may  be  succinctly  written  in  state  space  form  as 
(assuming  no  deterministic  input) 


x{t)  =  F  x{t)  +  G  w(t) 


(2.1) 


where  x(t)  is  the  time-derivative  of  the  n-dimensional  state  random  process  vectoru,  F 
is  the  n-by-n  time-invariant  system  dynamics  matrix ,  G  is  the  n-by-s  time-invariant 
noise  input  matrix,  w(t)  is  the  s-dimensional  model  noise  process  vector  (assumed 
zero-mean,  uncorrelated  in  time  or  “white,”  and  Gaussian)  [2T].  Equation  (2.1)  is 
called  the  system  dynamics  or  kinematics  model.  Likewise,  uncertain  observations 
are  made  at  time  samples  tk  and  are  given  by  the  mathematical  model 


z(tk )  =  H  x(tk)  +  v(tk)  (2.2) 

where  z(tk)  is  the  m-dimensional  measurement  random  process  vector  at  sample 
time  tk,  FI  is  the  m-by-n  time-invariant  measurement  matrix,  and  v(tk)  is  the  Tri¬ 
dimensional  measurement  noise  process  vector  (also  assumed  zero-mean,  white,  and 
Gaussian)  [21].  Equation  (2.2)  is  termed  the  measurement  model.  A  set  of  initial 
conditions  must  be  specified  to  obtain  a  particular  solution  to  the  above  differential 

lrTIie  state  random  process  vector  may  represent  kinematic  quantities  such  as  position,  velocity, 
and  acceleration. 


equations.  This  set  of  initial  conditions  is  usually  unknown,  in  which  case  it  may  be 
represented  by  a  Gaussian  random  vector  with  mean  and  covariance  specified  by 


E{x(t0)}  =  Xq 
E{[x(t0)  -  Xo}[x(t0)  -  x0]T}  =  P0. 


(2.3) 


Eventually,  these  models  will  likely  be  implemented  on  a  computer,  and  a 
discrete-time  form  for  Equation  (2.1)  will  be  required.  If  k  is  the  sample  index2 , 
then  the  shift-invariant3]  discrete-time  models  are 


System  Dynamics  Model:  x{k) 


&{k,  k  -  1  )x{k  -  1)  +  Gdwd{k  -  1) 


Measurement  Model:  z{k) 


H  x{k)  +  v{k ) 


(2.4) 


where  the  previous  nomenclature  holds,  d  stands  for  “discrete-time,”  and  <3?(fc,  k  —  1)  is 


the  n-by-n  discrete-time  state  transition  matrix  given  by  k  —  1) 


—  tk-l) 


21 


The  initial  conditions  to  these  difference  equations  are 


E{x(  0)}  =  Xq 

E{[x(0)  -  aj0][x(0)  -  x0]T}  =  P0. 


(2.5) 


2The  sampling  interval  is  defined  as  T  =  tk  —  G-i- 

3In  the  discrete-time  formulation,  “time-invariant”  becomes  “shift-invariant.” 
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The  first-  and  second-order  statistics  of  the  noise  process  vectors  are 


E{wd[k)} 

=  0 

E{wd(k)wTd(l)} 

=  Q d{k)  5k 

E{v(k)} 

=  0 

E{v{k)vT  (/)} 

=  R  {k)5kl 

E{v{k)wTd(l)} 

=  0 

E{v(k)xT{  0)} 

=  0 

E{wd(l)xT{  0)} 

=  0 

where  5m  is  the  Kronecker  delta  function: 


1  iffc  =  Z 
0  otherwise. 


Since  £c(0),  wd(k),  and  v[k)  are  assumed  jointly  Gaussian4],  the  last  three  lines  of 
imply  that  they  are  independent. 


Before  continuing  to  the  next  section,  it  should  be  noted  that  nonlinear  kine¬ 
matics  models  are  also  possible.  In  discrete  time,  the  nonlinear  shift-invariant  models 
are  [131  [21] 


System  Dynamics  Model: 

x(k ) 

=  4>[x(k  -  1)]  +  G dwd(k  -  1) 

Measurement  Model: 

z(k) 

=  h  [x(k)]  +  v(k) 

(2.7) 

where  <f>[-\  is  the  nonlinear  system  dynamics  vector  function  and  h[-]  is  the  nonlinear 
measurement  vector  function.  Note  that,  in  general,  these  equations  may  be  shift- 
varying,  in  which  case  (p[-\  and  h[-]  would  also  be  functions  of  the  appropriate  time 
index. 

4If  two  jointly  Gaussian  random  vectors  are  uncorrelated,  then  they  are  also  independent.  In 
general,  though,  uncorrelated  random  vectors  are  not  necessarily  independent  [2011211  [35] . 
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One  common  linear  system  dynamics  model  used  in  target  tracking  is  the  con¬ 
stant  velocity  (CV)  model  [7} 


The  CV  model  is  of  particular  interest  since  it 
was  used  in  simulations  by  Salmond  [30j[5l]  and  Williams  [381SQ1III]  to  evaluate  their 


Gaussian  mixture  reduction  methods  and  it  will  be  used  in  simulations  to  evaluate 
new  Gaussian  mixture  reduction  approaches  later  in  this  thesis.  For  a  target  traveling 
in  the  x-y  plane,  the  discrete-time  CV  model  is 


x{k) 

1  T  0  0 

x(k  —  1) 

vx(k) 

0  10  0 

vx{k  -  1) 

y(k) 

0  0  1  T 

t-H 

1 

_vy(k)_ 

0  0  0  1 

Vy(k  ~  1)_ 

rj~i  2 

~2 

T 

0 

0 


0 

0 

rp2 

T 


wx(k  -  1) 


Wy(k  ~  1) 


(2.8) 


where  T  is  the  sampling  interval,  x{k)  and  y(k)  are  the  x  and  y  target  positions, 
vx(k)  and  vy(k)  are  the  target  velocities  in  the  x  and  y  directions,  and  wx[k  —  1)  and 
wy(k  —  1)  are  the  model  noise  processes  in  the  x  and  y  coordinates.  The  model  noise 
process  is  zero  mean,  and  its  covariance  matrix  is 


Q  d(k)  —  Q  a 


Qdx  o 
0  qdy 


(2.9) 


5Various  other  system  dynamics  models,  as  well  as  measurement  models,  may  be  found  in  (4j[5i 


2.2  Recursive  Bayesian  Filtering 

A  recursive  Bayesian  filter6]  is  used  to  calculate  the  target  state  x(k).  For  linear 
models,  the  linear  recursive  Bayesian  filter,  which  is  the  well-known  Kalman  filter, 


provides  an  optimal  target  state  estimate  by  almost  all  practical  criteria  [21].  This 
filter  is  the  topic  of  Subsection  12.2.11  Nonlinear  recursive  Bayesian  filters  are  used 
when  the  dynamics  models  are  nonlinear,  the  measurement  models  are  nonlinear, 
or  both,  but  in  general  nonlinear  filters  do  not  enjoy  the  same  claim  to  optimality 
as  the  Kalman  filter  and  require  approximations  to  yield  a  finite- dimensional  form. 
Nonlinear  recursive  Bayesian  filters  are  the  subject  of  Subsection  12.2.21 


2.2.1  Linear  Recursive  Bayesian  Filtering.  For  the  discrete-time  linear 
models  listed  in  Section  I2.ll  the  Kalman  filter  is  the  optimal  estimator  of  the  state 
random  process  vector  [21].  The  Kalman  filter  may  be  derived  from  a  Bayesian  point 
of  view,  as  in  [21],  or  from  an  orthogonal  projection  perspective,  as  in  [131 135].  This 


subsection  provides  the  key  steps  from  |2T]  in  the  derivation  of  the  Kalman  filter 
using  the  Bayes  estimation  technique.  These  derivation  steps  will  hopefully  provide 
the  reader  with  insights  into  the  Kalman  filter  equations. 


Before  beginning  the  derivation,  it  is  worthwhile  to  introduce  important  pre¬ 
liminary  information  about  the  Kalman  filter.  First,  each  iteration  of  the  Kalman 
filter  operates  in  two  stages  that  are  referred  to  as  time  propagation  and  measurement 
update.  The  notation  “( estimate  at  time  index\measurement  history  through  time  in¬ 
dex)"  allows  one  to  follow  the  time  varying  quantities  of  the  filter  (e.g.,  £c(-|-),  which 
is  the  state  random  process  vector  mean  estimate)  at  each  stage  of  a  filter  iteration. 
For  example,  the  notation  x(k\k  —  1)  signifies  that  the  state  mean  estimate  has  been 
propagated  in  time  through  the  kinematics  model  to  sample  k,  but  measurements  at 
sample  k  have  not  yet  been  incorporated  into  the  estimate.  Second,  a  composite  vector 
Zk\  called  the  measurement  history ,  is  defined  as  a  composite  vector  formed  from  all 


6The  word  “filter,”  as  it  is  used  in  the  context  of  estimation,  differs  from  the  same  word  used  in 
deterministic  signal  processing.  In  this  thesis,  a  filter  should  be  thought  of  as  a  type  of  estimator 
that  includes  observations  up  to  and  including  sample  k  to  determine  the  target  state  [  1 3] . 
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of  the  measurement  random  process  vectors  from  the  time  of  the  initial  measurement 
through  the  current  sample,  k  [21].  That  is, 


*(1) 


Zk  = 


z(k) 


(2.10) 


Likewise,  a  composite  vector  of  realized  measurements  Zk,  termed  the  realized  mea¬ 
surement  history ,  is  defined  as  the  vector  containing  all  of  the  actual  measurements 
through  sample  k,  and  is  given  by 


Zl 


Zk  = 


Z/c 


(2.11) 


where  z*,  i  —  1, . . . ,  k,  is  the  ith  realized  measurement  random  process  vector  [21 


Third,  the  initial  state  random  process  vector  a;(0)  is  assumed  to  be  Gaussian,  and 
since  the  dynamics  and  measurement  models  are  linear,  the  state  random  process 
vector  at  any  sample  k  is  also  Gaussian7.  Additionally,  a  multivariate  Gaussian  pdf  is 
completely  specified  by  the  mean  and  covariance  parameters  of  the  Gaussian  random 
vector  it  represents,  and  expressions  for  these  parameters  can  be  found  by  applying 
the  expectation  operation,  E{-},  to  the  discrete-time  models  given  in  Equation  (12.4) 
at  the  appropriate  time  index.  Thus,  the  pdf  of  the  state  random  process  vector  x(k) 
may  be  found  at  any  sample  k. 

Using  this  preliminary  information,  the  derivation  of  the  Kalman  filter  equations 
begins  by  identifying  the  conditional  pdf  of  the  Gaussian  state  random  process  vector 


'Linear  transformations  (e.g.,  the  linear  dynamics  and  measurement  models)  of  a  Gaussian  ran¬ 
dom  vector  are  also  Gaussian  [21J ,  and  linear  combinations  of  jointly  Gaussian  random  vectors  are 
similarly  Gaussian. 
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x(k  —  1)  conditioned  on  the  measurement  history  through  sample  k  —  1  as  pi] 


exp  p(t  -  lit  -  1)  1(-)] 
(27r)x  \/det  P  (k  —  1|  k  —  1) 


(2.12) 


(■)  =  x{k  —  1)  —  x{k  —  l\k  —  1) 


or 


/  (x(k  -  1  )\Zk~l)  =  N{x(k  -  1);  x(k  -  1| k-  1),  P (k  -  l\k  -  1)}.  (2.13) 


The  values  of  x(k  —  l\k  —  1)  and  P(k  —  l\k  —  1)  are  the  outputs  from  the  last  iteration 
of  the  Kalman  filter. 


Next,  the  state  random  process  vector  x{k  —  1)  is  propagated  in  time  through 
the  linear  dynamics  model  (the  first  line  of  Equation  (2.4))  in  the  time  propaga¬ 
tion  stage.  Since  the  model  is  linear  and  the  state  and  model  noise  process  vectors 
are  jointly  Gaussian,  the  state  random  process  vector  at  sample  k  conditioned  on 


the  measurement  history  through  sample  k  —  1  is  also  Gaussian  [21],  This  fact  can 
be  shown  by  applying  Bayes’  rule8  and  the  result  that  the  product  of  two  Gaus¬ 
sian  pdfs  is  a  Gaussian  pdf  pi].  The  state  random  process  vector  pdf  is  given  by 
J\f{x(k)\  x{k\k  —  1),  ~P(k\k  —  1)},  and  its  mean  and  covariance  parameters  are 


x(k\k  —  1)  =  &(k,  k  —  l)x[k  —  \\k  —  1)  (2-14) 

P(k\k  —  1)  =  ®(k1k-l)P(k-l\k-l)<f>(k,k-l)T +  GdQd(k-l)G^. 


8The  law  of  conditional  probability  for  two  pdfs  states  that 


f{A\B)  = 


f(AB ) 
f(B)  ■ 


Bayes’  rule  follows  from  the  law  of  conditional  probability  for  pdfs  and  is  given  by 


f(A\B)  = 


14 


Equations  (2.14)  are  the  Kalman  filter  time  propagation  equations. 


The  next  step  in  the  Kalman  filter  equations  derivation  is  to  apply  the  current 
measurement,  z(k),  in  the  measurement  update  stage.  Using  Bayes’  rule  and  noting 
that  Equation  (2.10l)  may  be  written  as 


zk~i 


z(k) 


the  pdf  of  the  state  random  process  vector  conditioned  on  all  of  the  measurements 


through  sample  k  is  [21 


f{x(k)\Zk)  = 


f(x(k),z(k).Zk-') 


(conditional  probability  for  pdfs) 


/  (z(k),  Zb-V) 

/  (zWMfr).  z*-1)  /  (x(k),  z*-1) 
s  (z(k)\zk-v)  S  (zb-v) 

f  (z(k)\x(k),  Zk-')  f  (x(k)\Zk~l)  f  (Zk~l) 


more  of  the  same) 


f  (zWlZ*-1)  f  (Z*-1) 
f  (z{k)\x{k),  Z k~1)  Af{x(k);x(k\k  -  1),  P(fc|fc  -  1)} 


and  again) 


(2.15) 


From  a  broad  perspective,  the  last  line  of  Equation  (2.15)  provides  the  justi¬ 
fication  for  the  term  “ recursive  Bayesian  filter.”  The  state  random  process  vector 
pdf  at  the  current  sample  k,  conditioned  on  the  latest  measurements,  cannot  be  cal¬ 
culated  until  the  state  random  process  vector  pdf  conditioned  on  the  measurements 
up  to  the  previous  time  index  is  determined.  That  is,  Equation  (2.14)  must  first  be 
determined  before  the  pdf  A f{x(k);  x[k\k  —  1),  P(k\k  —  1)}  can  be  specified.  Hence, 
the  recursive  nature  of  the  filter  is  apparent  in  the  last  line  of  Equation  (2.15).  The 
terms  /  (z(k)\x(k),  ZA:_1)  and  /  (Kz(k)\Zk _1)  are  found  with  relative  ease  by  noting 
that  they  are  Gaussian  pdfs  that  are  completely  specified  by  their  respective  mean 
and  covariance  [21].  One  may  find  these  parameters  by  applying  the  conditional  ex- 
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pectation  operation  to  the  discrete-time  measurement  model  given  in  the  second  line 
of  Equation  (l2.4j).  However,  determining  the  final  form  of  /  ( x{k)\Zk )  is  not  so  easy 
since  it  requires  a  rather  unpleasant  amount  of  linear  algebraic  manipulations  to  show 
that  /  (x(k)\Zk)  is  a  Gaussian  pdf. 


Returning  to  the  derivation,  the  initial  representation  of  Equation  (12.151)  is  a 
complicated  algebraic  form  which  has  very  little  apparent  resemblance  to  a  Gaus¬ 
sian  pdf  [21].  However,  after  several  pages  of  algebraic  manipulations,  /  (x(k)\Zk) 
assumes  the  much  nicer  form  of  pi] 


/(*(*) \z“) 


exp  [— | [x(k)  —  x{k\k)Y  P{k\k)  1[x(k)  —  i(fc|/c)]] 
(2n)%  y^detP Jk\k) 


=  J\f{x(k)]  x(k\k),  P(k\k)}. 


The  parameters  of  this  pdf  are  (note  that  is  the  realization  of  the  random  vector 
z(k )  at  sample  k ) 


x{k\k)  =  x{k\k  —  1)  T  K(/c)[z^  —  Y$.x(k\k  —  1)] 

P{k\k)  =  P(k\k-l)-K(k)HP(k\k-l)  (2.16) 


and  they  are  referred  to  as  the  Kalman  filter  measurement  update  equations  [21].  Note 
that  the  term  Hx(k\k  —  1)  =  z[k\k  —  1)  is  the  conditional  mean  of  the  measurement 
at  sample  k ,  or  the  predicted  measurement  .  Also,  the  term  K {k)  is  called  the  Kalman 


gain  which  is  given  by  [2T 


K(fc)  =  P{k\k  -  l)UT[UP(k\k  -  1)Ht  +  R(fc)] 


-i 


(2.17) 


The  previous  equation  completes  the  presentation  of  the  key  steps  in  the  derivation 
of  the  Kalman  filter  equations  using  the  Bayes  estimation  technique. 
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In  summary,  given  the  discrete-time  models  in  Equation  (]2.Hj),  including  the 
initial  conditions  in  Equation  (12.51)  and  noise  statistics  given  in  Equation  (12.61),  one 
iteration  of  the  Kalman  filter  is  calculated  by  using  the  Kalman  filter  time  propagation 
equations : 

•  Propagate  the  mean  estimate  to  sample  k  according  to  the  system  dynamics 
model: 

x(k\k  —  1)  =  <E>(/c,  k  —  l)x(k  —  l\k  —  1) 

•  Propagate  the  covariance  estimate  to  sample  k  according  to  the  system  dynamics 
model  and  add  the  covariance  of  the  model  noise  process  vector: 

P(k\k  -  1)  =  $(fc,  k  -  1)P (k  -  1| k-  l)$(fc,  k-  1)T  +  GdQ d{k  -  l)Gj, 


followed  by  the  Kalman  filter  measurement  update  equations : 

•  Calculate  the  covariance  of  the  residual  r(k)  and  the  realized  residual  r*,,  which 
is  the  residual  evaluated  with  the  observed  measurement  z^: 


r{k ) 

S  (k) 


r 


z(k)= zfc 


z{k)  —  z{k\k  —  1) 
E{r{k)rT  (k)} 
HP(fc|fc-  1)HT  +  R(A:) 

Zfc  -  z(k\k  -  1) 


•  Compute  the  Kalman  gain  as  a  function  of  the  uncertainty  in  the  system  dy¬ 
namics  model  and  the  measurement  model  at  sample  k: 

K  (k)  =  P(k\k  —  l)HTS-1(fc) 
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•  Add  the  “new  information”  from  the  measurement  at  sample  k  to  the  propa¬ 
gated  mean  estimate  to  form  the  measurement-updated  conditional  mean  esti¬ 
mate: 

x{k\k)  =  x{k\k  —  1)  +  K(/c)rfc 


•  Subtract  the  weighted  propagated  covariance  estimate  from  the  propagated  co- 
variance  estimate: 


P{k\k)  =  P(k\k  -  1)  -  K{k)UP(k\k  -  1). 

The  Kalman  gain  plays  a  pivotal  role  in  the  Kalman  filter  because  it  reflects  the 
amount  of  confidence  placed  in  the  information  provided  by  the  measurements  relative 
to  that  of  the  propagated  information  [35].  If  the  uncertainty  in  the  measurements 
is  large  (as  indicated  by  R (k)),  then  the  Kalman  gain  is  small  (notice  [. . .  +  R(fc)]-1 
in  Equation  (2.171)).  and  the  impact  of  the  measurement  on  the  state  random  pro¬ 
cess  vector  mean  and  covariance  estimates  is  small.  Likewise,  if  the  measurement 
uncertainty  is  small,  then  the  entries  in  the  matrix  R(/c)  are  small,  and  the  measure¬ 
ments  have  a  greater  impact  on  the  updated  state  random  process  vector  mean  and 
covariance  estimates. 

From  a  visual  perspective,  the  conditional  pdfs  f(x(k  —  l)|Zfc_1),  f  (x(k)\Zk~l) , 
and  f(x(k)\Zk)  are  modified  according  to  the  Kalman  filter  equations.  For  a  typical 
problem  involving  a  scalar  state  random  process,  these  pdfs  appear  in  Figure  12.21 
Initially,  before  the  time  propagation  and  measurement  update  stages  are  entered, 
the  scalar  state  process  has  a  conditional  pdf  represented  by  the  dotted  trace  in  the 
figure.  After  time  propagation,  the  scalar  state  process  pdf,  shown  as  the  dash-dotted 
trace,  is  modified  according  to  Equation  (2.14)  and  the  width  of  the  pdf  is  larger 
than  before  since  the  dynamics  model  adds  uncertainty.  Finally,  the  measurement  is 
incorporated  and  the  scalar  state  random  process  pdf  is  narrower  than  the  propagated 
pdf  as  shown  by  the  solid  trace.  This  narrowing  of  the  pdf  is  expected  since  the 
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Figure  2.2:  Typical  conditional  scalar  state  pdfs  before  time  propagation,  after  time 
propagation,  and  after  measurement  update. 

variance  of  the  scalar  state  random  process  is  smaller  than  that  of  the  propagated 
scalar  state  random  process  according  to  Equation  (12.161):  this  is  the  benefit  obtained 
from  the  most  recent  measurement. 

2.2.2  Nonlinear  Recursive  Bayesian  Filtering.  Nonlinear  recursive  Bayesian 
filters  are  used  when  the  dynamics  and/or  measurement  models  are  nonlinear.  Non¬ 
linear  transformations  destroy  the  Gaussian  nature  of  the  target  state  random  process 
vector,  and  the  mean  and  covariance  of  the  state  random  process  vector  no  longer 
completely  describe  the  target  state  pdf.  In  the  best  case,  if  the  nonlinearity  in  the 
transformation  is  negligible,  then  a  Gaussian  pdf  may  be  a  good  approximation  to 
the  true  target  state  pdf.  In  the  worst  case,  if  the  nonlinearity  in  the  transformation 
is  substantial,  then  the  true  target  state  pdf  will  likely  bear  little  resemblance  to  a 
Gaussian  pdf.  Considering  either  case,  an  optimal  nonlinear  recursive  Bayesian  fil¬ 
ter  would  need  to  compute  an  infinite  number  of  moments  to  characterize  the  exact 
target  state  pdf  |M1[22].  By  contrast,  only  the  mean  and  covariance  of  the  state  ran- 
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dom  process  vector  are  required  to  describe  the  target  state  pdf  completely  when  the 
models  are  linear. 


Tracking  problems  that  require  slightly  nonlinear  models  are  adequately  solved 
with  the  extended  Kalman  filter  (EKF).  Problems  which  utilize  moderately  nonlinear 
models,  or  models  in  which  analytic  expressions  for  the  Jacobian  of  the  nonlinear 
system  dynamics  or  nonlinear  measurement  vector  functions  do  not  existn,  are  po¬ 
tentially  better  suited  to  an  unscented  Kalman  filter  solution.  In  the  case  of  highly 
nonlinear  models,  or  models  in  which  closed-form  expressions  for  the  Jacobian  of 
the  nonlinear  system  dynamics  or  nonlinear  measurement  vector  functions  are  not 
available,  the  tracking  problem  could  be  solved  using  a  particle  filter.  The  EKF  is 
introduced  in  this  subsection,  but  the  unscented  Kalman  filter  and  the  particle  filter 
are  not  covered. 


For  nonlinear  models,  both  the  nonlinear  system  dynamics  vector  function  and 
nonlinear  measurement  vector  function  can  be  written  in  a  Taylor  series  expansion 
about  some  nominal  point  as  long  as  an  analytic  expression  for  the  Jacobian  and 
higher-order  derivatives  of  both  nonlinear  vector  functions  exist.  If  the  vector  func¬ 
tions  are  slightly  nonlinear,  then  the  second-  and  higher-order  terms  in  their  respective 
expansions  may  be  justifiably  ignored  to  create  a  first-order  linear  approximation  of 
each  nonlinear  vector  function  [22] .  Then,  the  Kalman  filter  Equations  (12.141),  (|2.16|), 
and  (12.171)  can  be  used  by  replacing  the  state  transition  and  measurement  matrices 
with  the  Jacobian  of  the  nonlinear  system  dynamics  and  measurement  vectors,  re¬ 


spectively,  about  some  appropriate  point  [21].  These  steps  form  the  basis  for  the 
EKF  derivation. 


A  mathematical  derivation  of  the  EKF  similar  to  the  one  found  in  (22 


is  pre¬ 


sented  below.  The  final  equations  for  the  EKF  appear  at  the  end  of  this  subsection, 


and  they  are  placed  in  the  same  propagate- up  date  structure  as  used  for  the  Kalman 


9In  [14] ,  the  author  proposes  using  radar  cross-section  measurements  to  track  and  identify  targets 
simultaneously.  Because  of  the  nature  of  the  measurement  equation  for  radar  cross-section,  a  closed- 
form  measurement  equation  does  not  exist.  Consequently,  the  Jacobian  of  the  measurement  equation 
cannot  be  found. 
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filter  equations  to  accentuate  the  similarity  between  the  steps  required  to  implement 
each  filter.  These  equations  emphasize  an  important  distinction  between  the  Kalman 
filter  and  the  EKF:  unlike  in  the  Kalman  filter  equations,  the  state  mean  estimate, 
£c(-),  and  state  covariance  estimate,  P(-),  are  interdependent  in  EKF  equations.  The 
interdependence  of  the  state  mean  and  covariance  estimates  stems  from  expanding 
the  Taylor  series  about  the  state  mean  estimate.  Also,  since  the  true  models  are 
nonlinear,  one  should  keep  in  mind  that  the  true  target  state  pdf  is  not  a  Gaussian 
pdf  as  one  may  be  led  to  believe  by  the  EKF  equations. 

The  nonlinear,  shift- invariant  system  dynamics  and  measurement  models  are 
given  as 


System  Dynamics  Model:  x(k)  =  4>[x(k  —  1)]  +  Gdwd(k  —  1) 

Measurement  Model:  z(k)  =  h[*(fc)]  +  v(k).  (12.711 

If  x(k  —  1| k  —  1)  is  the  conditional  mean  of  x(k  —  1),  and  x(k  —  1)  is  described  by 
the  first-  and  second-order  statistics 

E{x(k-  l)\Zk~1}  =  0 
E{x(k  -  l)x(k  -  l)T\Zk~1}  =  P(k-l\k-l) 

then  the  state  random  process  vector  at  sample  k  —  1  is  equivalently  represented  by 
x(k  —  1)  =  x(k  —  l\k  1)  +  x(k  —  1).  In  a  similar  manner,  the  propagated  state 
random  process  vector  at  sample  k  may  be  written  as  x(k)  =  x(k\k  —  l)+x(k),  where 
x(k)  is  zero-mean  and  P(k\k  —  1)  is  its  covariance.  The  Taylor  series  expansion  of 
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<p[x(k  —  1)]  about  x(k  —  l\k  —  1)  is10 


4>[x(k  —  1)]  =  <p[x(k  —  l\k  —  1)  +  x(k  —  1)] 


=  <f>[x(k 


ilk- in  + 


d(p[x(k  —  1)] 
dx(k  —  1) 


x(k-  l)  +  H.O.T. 

x(k—l)=x(k—l\k—l) 


«  <p[x(k 


life- 1)]  + 


d<f>[x(k  —  1)] 
dx{k  —  1) 


x[k  —  1) 

x(k— l)=x(k— l\k— 1) 


(2.18) 


where  H.O.T.  stands  for  “higher  order  terms”  which  are  neglected  in  the  first-order 
approximation  [3],  [22].  A  similar  expression  may  be  found  for  the  first-order  linear 
approximation  to  the  nonlinear  measurement  vector  function. 

The  propagated  state  conditional  mean  and  covariance  estimates  may  be  found 
by  substituting  Equation  (12.181)  into  the  conditional  expectation  equations  for  these 
quantities,  conditioned  on  the  measurement  history  through  sample  k  —  1.  Noting  that 
£c(-)  is  zero-mean,  Wd(-)  is  zero-mean  and  uncorrelated  in  time,  and  <p[x(k  —  l\k  —  1)] 

10A  modified  form  of  the  Taylor  series  is  given  in  [3]  as 

OO  ,  Yi 

cj)(x  +  h)  =  -rd(n)(*)- 

n\ 

n=0 

This  form  is  adapted  according  to  [22]  and  used  in  the  derivation  of  the  propagation  equations  for 
the  EKF. 
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is  a  constant,  the  propagated  state  mean  estimate  is 


x(k\k-l)  =  E  {x(k)\Zk~1} 

=  E  [4>[x(k  —  1)]  +  Gdwd(k  —  l)|Zfc_1} 

«  0[*(fc  -  l\k  -  1)]  +  G dE  {wd(k  -  1  )\Zk~1} 

d  <p[x(k  —  1)] 
dx{k  —  1) 

=  <p[x{k  —  l\k  —  1)].  (2-19) 


x(k—l)=x(k—l\k—l) 


E  <  x{k  —  1) 


rk— 1 


In  a  similar  manner,  the  propagated  state  random  process  vector  covariance  estimate 

is 

P{k\k  -  1)  =  E{[x(k)  -  x(k\k  -  l)][®(fc)  -  x(k\k  -  l)]r|Zfc_1} 

=  E  {x(k)x(k)T\Zk~1}  —  x(k\k  —  l)x{k\k  —  1)T 
=  E{[(f>[x(k  -  1)]  +  G dwd(k  -  1  )}[4>[x(k  -  1)]  +  Gdwd(k  -  1)]T| Zfc-1} 
—  x(k\k  —  l)x(k\k  —  1)T 
=  E  {<fi[x(k  —  1  )\<p[x(k  —  l)]3  | Zfc~3} 

+  E  {<f>[x(k  -  1  )}wd(k  -  l)TGj|Zfc”1} 

+  E  {G dwd(k  -  l)#c(fc  -  l)]T|Zfc-1} 

+  G dE  {wd(k  -  1  )wd(k  -  l)T|Zfc“1}  Gj  -  x(k\k  -  l)®(Jfc|A;  -  1)T 
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Aside  1 : 


E{<fi[x{k  —  l)\<j>[x{k  —  1)]J  \Zk  «  E  {(p[x(k  —  1| k  —  1  )\<j>[x{k  —  1| k  —  1)] ;  \Zk  1} 

id<j>[x(k  —  1)]T 


+  E  <  <p[x(k  —  1| k  —  1  )\x(k  —  1)J 


dx(k  —  1) 


+  E 


d<p[x{k  —  1)] 


dx(k  —  1) 
d(j)[x(k  —  1)] 


x(k— l)=x(k— l\k— 1) 
T 


dx{k  —  1) 
d<p[x(k  —  1)]T 


x(k  —  1  )<f>[x(k  —  l\k  —  1)] 

x{k— l)=x(k— l\k— 1) 

■  E  {x(k  -  l)x(k  -  l)T\Zk~1} 


rk—1 


rk— 1 


x(k— l)=x(k— l\k— 1) 


dx(k  —  1) 


x(k— l)=x(k— l\k— 1) 

=  4>{x(k  —  1|  k  —  1  )\4>[x(k  —  1 1  A:  —  1)]T  +  0  +  0+ 
d<f>[x(k  —  1)] 


dx{k  —  1) 


x(k—l)=x(k—l\k—l) 


=  x(k\k  —  l)x(k\k  —  l)T+ 


d(p[x(k  —  1)] 
dx{k  —  1) 


•  P(k-l\k-l)  ■ 

x(k—l)=x(k—l\k—l) 


d<p[x(k  —  1)]T 
dx{k  —  1) 


x(k—  l)=x(k—  l|fc— 1) 


Aside  2: 


E  {<f>[x(k  -  1  )}wd(k  -  lfGj|  Zk-k}  =  E  {<j>[x(k  -  l)}\Zk~1}  E  {wd(k  -  1  )T\Zk~1}  G 

=  0 

E  {G dwd(k  -  1  )(f>[x{k  -  l)]T|Zfe-1}  =  0 
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ft.  4 


P(k\k  —  1)  = 


d(j)[x{k  —  1)] 


dx(k  —  1) 


■P(k-l\k-l) 


x(k—l)=x(k—l\k—l) 


d(j)[x{k  —  l)]" 


dx(k  —  1) 


x(k—  l)=x(k—  l|fc— 1) 

+  GdQdGji  +  x{k\k  —  l)x(k\k  —  1)T  —  x(k\k  —  l)x(k\k  —  1)T 
d<p[x(k  —  1)] 


dx(k  —  1) 

G^QrfGj. 


x(k-l)=x{k-l\k-l)  1 ) 


x(k—l)=x(k—l\k—l) 

(2.20) 


Expressions  for  the  measurement-updated  state  mean  and  covariance  estimates  may 
be  derived  using  the  same  techniques  used  in  the  propagated  estimates  derivation. 

The  final  EKF  equations  are  listed  below  according  to  the  EKF  stage  in  which 
they  are  calculated.  At  the  time  propagation  stage  of  the  EKF,  calculate 


1)  = 


dcj)[x{k  —  1)] 


x(k— l)=x(k— l\k— 1) 


dx(k  —  1) 
x(k\k  —  1)  =  cj)[x{k  —  1|  k  —  1)] 

P{k\k  -  1)  =  $(fc,  k  -  l)P(fc  -  1| k  -  l)$(fc,  k  -  1)T  +  GdQd(k  -  l)Gj.  (2.21) 


In  the  measurement  update  stage  of  the  EKF,  calculate 


H  (k) 
K  (k) 

x(k\k) 
P{k\k ) 


dh[x(k)} 


dx(k ) 


x(k)=x(k\k— 1) 

t  r 


=  P{k\k  -  l)U{ky  [H(k)P(k\k  -  1)H(/c)t  +  K(k)}-1 
=  x(k\k  —  1)  +  K (k)  [: zfc  —  h[cc(A;|A;  —  1)]] 

=  P(k\k  -  1)  -  K(k)U{k)P(k\k  -  1). 


(2.22) 
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2.3  Multivariate  Gaussian  Mixtures 


Understanding  multivariate  Gaussian  mixtures  will  aid  in  understanding  the 
target  state  pdf  generated  by  the  Bayesian  solution  for  tracking  problems  in  which 
uncertainty  exists  in  the  model  parameters  or  in  the  origin  of  the  measurements. 
More  importantly,  since  Gaussian  mixture  reduction  is  the  focus  of  this  thesis,  a  good 
introduction  to  this  subject  is  necessary  before  continuing  to  the  next  chapters. 

In  this  thesis,  a  multivariate  Gaussian  mixture  pdf  is  a  weighted,  finite  sum  of 
multivariate  Gaussian  pdfs.  It  is  characterized  by  the  number  of  mixture  components 
and  the  weight,  mean  vector,  and  covariance  matrix  of  each  component.  Since  a  pdf 
must  be  nonnegative,  and  the  integral  of  a  pdf  over  the  sample  space  of  the  random 
quantity  it  represents  must  evaluate  to  unity,  the  mixture  weights  must  be  nonnegative 
and  the  sum  of  all  of  the  weights  must  equal  one.  The  multivariate  Gaussian  mixture 
pdf  of  the  random  vector  x  with  the  parameter  set  Q  is  represented  by 

M 

f  (x\Q)  =  ^2Pif  (x\lh,P,)  (2.23) 

i=  1 


where  M  is  the  number  of  mixture  components  and  Pi,  and  P,  are  the  weight,  mean 
vector,  and  covariance  matrix  for  each  component  i  =  1, . . . ,  M.  Each  multivariate 
Gaussian  pdf  has  the  form 


f{x\Hi,Pi)  =  N{x\  Pj} 


exp  [-\(x  -  aOTP,:  -  to)] 

(27r)  t-^det  Pi 


(2.24) 


where  n  is  the  dimension  of  the  random  vector  x  and  the  covariance  P,  is  symmetric 
positive  definite.  Figure  2T3]  illustrates  a  four-component  univariate  Gaussian  mixture 
pdf. 

The  overall  mean  and  covariance  of  x  given  in  (2.23)  are  derived  in  Appendix 
B  of  [31]  and  in  Chapter  II  of  [38].  These  statistics  are  reproduced  in  the  following 
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Figure  2.3:  An  illustration  of  a  four-component  univariate  Gaussian  mixture  pdf 

(the  solid  line).  Note  that  mixture  components  (represented  by  the  dash-dotted 
traces)  are  scaled  by  their  respective  mixture  weights  in  this  graphic. 

equations. 


M 

M  =  ^PiVi 
i=  1 
M 

p  =  ^2  Pi  (pi  +  /*j/*o  - 
2=1 
M 

(2.25) 

2=1 


Pi 


O*  -  v)  (ft  ~  m)j 


Merging  components  of  a  target  state  Gaussian  mixture  pdf  is  one  of  two  kinds 
of  mixture  reduction  actions  which  will  be  used  in  this  thesis.  In  Chapter  3  of  [31]. 
Salmond  derived  the  equations  for  the  merged  mixture  component  weight,  mean  vec¬ 
tor,  and  covariance  matrix  resulting  from  merging  two  or  more  components  of  the 
original  multivariate  Gaussian  mixture  pdf.  These  equations  were  derived  under 
the  constraint  that  the  overall  mean  and  covariance  of  the  original  mixture  is  pre- 
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served  [31].  The  new  merged  component  parameters  are 


Pm  =  Pi 

i£l 

Mm  =  —  X ]Pi»i 

Pm  =  —  ^  Pi  (Pi  +  M;Mf)  -  Mm  Mm  (2-26) 


where  i  G  X  indicates  that  the  summation  is  taken  only  over  those  components  that 
are  merged  and  the  subscript  “m”  is  used  to  differentiate  the  merged  component 
parameters  from  the  others.  As  an  example,  if  mixture  components  1  and  2  are 
merged,  then  the  resulting  merged-component  weight,  mean  vector,  and  covariance 
matrix  would  be  (as  derived  by  Williams  in  [38]) 


P 12 
M 12 

Pl2 


P1+P2 
1 

(P1+P2) 

1 

(P1+P2)  LJ 


(PiMi  +P2M2) 


P1P1  +P2P2 


P1P2 
Pi  +P2 


(Mi  -  M2)(Mi  -  M2)5 


Deleting  a  mixture  component  is  the  second  kind  of  mixture  reduction  action 
that  will  be  used  in  this  thesis.  If  a  component  of  a  target  state  Gaussian  mixture  pdf 
is  deleted,  then  all  one  needs  to  do  is  ensure  that  the  reduced  set  of  mixture  weights 
is  re-normalized  so  that  they  sum  to  one. 


2.f  Bayesian  Approaches  for  Kinematics  Model  Parameter  Uncertainty 


In  the  previous  sections,  uncertainty  was  represented  by  the  white  Gaussian 
noise  process  vectors  wd(k  —  1)  and  v(k)  and  the  initial  conditions  on  the  state 
random  process  vector  in  Equation  (12.51)  in  the  discrete-time  system  kinematics  and 
measurement  models.  This  section  introduces  a  new,  realistic  source  of  uncertainty 
encountered  in  target  tracking.  Two  recursive  Bayesian  approaches  are  formulated 
depending  on  how  one  chooses  to  represent  the  uncertainty. 

Consider  the  discrete-time  linear  system  dynamics  model 

x(k)  =  k  —  1  )x(k  —  1)  +  G dwd(k  —  1)  (12.41) 

where  wd{k)  is  zero-mean  with  covariance  Qrf.  This  equation  represents  the  equivalent 
discrete-time  model  of  the  continuous-time  system  describing  the  motion  of  a  target 
over  time.  If  this  model  does  not  adequately  describe  the  target’s  motion,  then 
the  state  vector  estimated  by  a  Kalman  filter  based  on  this  model  will  likely  be  very 
inaccurate.  The  problem  is  not  the  Kalman  filter,  but  the  assumption  that  the  system 
dynamics  model  sufficiently  describes  the  target’s  motion  over  the  time  of  interest. 
One  remedy  for  this  situation  is  to  use  more  than  one  model  when  estimating  the 
state  random  process  vector.  The  underlying  assumption  is  that  at  least  one  of  the 
employed  models  is  an  adequate  characterization  of  the  target’s  kinematics.  As  a 
designer,  this  begs  the  question,  “How  can  this  be  done?” . 

Assuming  that  the  form  of  the  linear  system  dynamics  model  in  Equation  (12.4)) 
is  correct,  the  answer  to  this  question  begins  with  the  model  matrices  $(/c,  k  —  1), 
Gd,  and  Qf;.  Fundamentally,  system  dynamics  models  are  defined  by  the  elements 
of  <&(£;,  k  —  1),  G d,  and  Q^.  One  way  to  represent  the  uncertainty  mathematically 
in  the  kinematics  model  is  to  assign  these  elements  to  random  quantities.  If  these 
random  matrix  elements  are  placed  into  a  vector,  then  the  vector  is  either  a  continuous 
random  vector  (i.e.,  their  values  are  unknown  constants )  denoted  by  M  (for  “non¬ 
switching”  models)  or  a  continuous  random  process  vector  M(/c)  (for  “switching” 
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models).  The  difference  between  non-switching  and  switching  models  is  that  the  latter 
type  includes  a  temporal  dependence  which  admits  the  possibility  that  the  target  can 
assume  different  dynamics  models  over  the  tracking  time  of  interest.  However,  a  non- 
switching  model-based  Bayesian  solution  may  also  be  used  to  track  a  target  which 
exhibits  various  dynamics  over  a  time  interval  of  interest  by  including  the  appropriate 


modifications  to  enable  model  switching  [23  . 

In  general,  the  sample  space  of  the  random  vectors  representing  the  elements 
of  the  <&(k,k  —  1),  Grf,  and  Q ,i  matrices  is  the  n-dimensional  real  vector  space  Rn: 
M,  M(fc)  £  M  C  1"  [22]  •  Conceptually,  the  continuous  sample  space  M  contains 
every  possible  model  that  has  the  form  of  Equation  (]2.4j).  However,  since  the  sample 
space  includes  every  possible  model,  an  uncountably  infinite  number  of  matrix  element 
combinations  is  possible,  and  the  Bayesian  solution  is  not  well-suited  to  real-time 


application  [22 


Despite  this  obstacle,  two  recursive  Bayesian  approaches  suitable  for  real-time 
implementation  may  be  formulated  by  making  a  finite  discretization  of  the  contin¬ 
uous  sample  space,  M,  along  with  other  modifications.  Subsection  12.471  introduces 
the  recursive  Bayesian  solution  for  non-switching  models  and  the  ad  hoc  modifica¬ 
tions  necessary  for  this  solution  to  accommodate  target  maneuvering.  Subsection 
12.4.21  presents  the  recursive  Bayesian  solution  for  switching  models  and  three  common 
approximations  required  to  make  this  solution  computationally  tractable. 


2-4-1  Non-Switching  Models.  For  non-switching  models,  the  kinematics  of 
the  target  is  assumed  to  be  adequately  modeled  by  at  least  one  model  in  the  continu¬ 
ous  sample  space,  M,  for  all  times  of  interest  (e.g.,  a  plane  travels  according  to  model 
A  for  the  observation  times  of  interest).  In  most  practical  target  tracking  applications, 
this  constraint  is  unrealistic  since,  for  example,  an  aircraft  could  move  at  a  constant 
velocity  for  a  period  of  time  and  then  perform  some  type  of  maneuver  while  in  the 
surveillance  region  of  an  enemy  tracking  system.  However,  two  ad  hoc  modifications 
to  this  approach  allow  the  models  to  switch  over  time  and  accommodate  maneuvering 
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targets  [4]  [23].  In  this  thesis,  a  non-switching  model  Bayesian  solution  that  incorpo¬ 
rates  the  ad  hoc  modifications  to  enable  model  switching  is  called  a  Multiple  Model 
Adaptive  Estimation  (MMAE)  algorithm. 

The  Bayesian  solution  for  non-switching  models  is  derived  in  this  section.  Ini¬ 
tially,  the  model  sample  space  M  is  the  continuous  vector  space  Mn,  but  this  space 
is  then  discretized  such  that  M  =  where  Nf  is  the  number  of  models  (and 

thus  the  number  of  elemental  filters  in  the  non-switching  model  algorithm),  to  obtain 
a  real-time  solution  [22]  ■  Care  must  be  taken  when  choosing  the  discretization  of 
the  original  sample  space  so  that  at  least  one  model  in  the  set  adequately 

describes  the  kinematics  of  the  target  for  all  times  of  interest. 

The  derivation  begins  by  modifying  Equation  (2.15)  to  include  the  random  vec¬ 
tor  M  as  a  quantity  to  be  estimated  by  inserting  it  to  the  left  of  the  conditioning 
symbol.  Assuming  the  joint  pdf  of  x(k ),  Zk ,  and  M  exists,  the  recursive  Bayesian 


solution  is 


f{x(k),M,Zk) 


l(x(k),m\zk) 


/P(fc)|M.Zt)/(Mlz'=)/(z'0 

/  (Zk) 


f  (x(k)\M,  Zk)  f  (M\Zk) 


(2.27) 


where  the  first  pdf  is  the  Kalman  filter  solution  conditioned  on  a  given  model  (a 
Gaussian  pdf)  and  the  second  pdf  is  of  the  model  conditioned  on  knowledge  of  the 


measurement  history 


.  The  focus  of  the  remainder  of  the  derivation  is  on  the 


second  conditional  pdf,  /  (M|Zfc). 
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The  model  conditional  pdf  in  Equation  (]2.27|  evaluates  to 


/  (M|Z‘) 


/(M.z(t).Z6-1) 

/  (z(fc),  Z1"1) 

/  (z(fc)|M,  Z"-)  /  (MIZ'-1)  /  (Z"-1) 
JM/(z(fc),M,Z‘-1)dM 

/  (g(fc)|M,  Z*-1)  /  (MIZ"-1)  /  (Z1-1) 

/M /  (*W|M,  Z‘-‘)  /  (MIZ*-1)  dM/  (Z*-1) 

/M/(Z(fc)|M,Zl-1)/(M|Zt-1)  dM 


where  the  denominator  is  seen  as  the  marginal  pdf  of  the  joint  random  vectors  Zk  and 
M.  integrating  out  the  dependence  on  M.  In  general,  the  integral  in  the  denominator 
will  require  a  computationally  costly  numerical  solution  and  will  likely  prohibit  the 
use  of  Equation  (12.281)  in  an  online  implementation  [22].  This  problem  is  overcome 
by  a  finite  discretization  of  1 


22 


Now,  instead,  let  M  =  {Mjjfh  Then  M 
tor  and  its  pdf  may  be  written  in  terms  of  the 


becomes  a  discrete  random  vec- 
probabilit.y  mass  function  (pmf) 
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(2.29) 


P(  Mj) 


P[M 


M,] 


Nf 

f  (M\Zk)  =  J2p(M  =  Mt|Zfc)<5(M-Mi) 


Nf 

=  '^2/j,i(k)5(M  -  Mj) 


2=1 


where  Pi(k)  =  p(M  =  Mi|Z’fc)  is  the  hypothesis  conditional  probability  or  mode 
probability  [1][22J[23] .  This  quantity  represents  the  probability  that  model  i  is  correct, 
given  the  observed  measurements  |4],[22] .  The  mode  probabilities  are  constrained  by 

Nf 

fJ-i(k)  >  0  ,i  =  l,...,Nf,  and  y^jij(k)  =  1 

2=1 

which  will  hold  true  (and  will  subsequently  be  shown)  as  long  as  /q(0)  >  0,Vt  and 

ESft(o)  =  ip]. 


11  The  generalized  definition  of  the  pdf  allows  for  a  pdf  representation  of  a  discrete  random  variable 
by  noting  that 

Fx(x)  =  ^ ~2px{xk)u{x  -  xk ) 


where  Fx(x)  is  the  cumulative  distribution  function  of  the  random  variable  X  and  px{xk)  is  the 
pmf  of  X  [20] .  The  derivative  of  Fx(x)  is,  by  definition,  fx{x),  so  the  pdf  of  a  discrete  random 
variable  is  [20] 

—Fx(x)  =  fx(x)  =  y ^px(xk)S(x  -  xk). 
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Substituting  the  expression  above  for  /  (M|Zfc)  and  f  (M|Zfc  x)  in  Equation 
(12.28)  results  in 


Nf 


i=  1 


Nf 


f  (z(k) |M,  Zk~l)  tH(k  -  1)4(M  -  MO 


i= 1 


AU 


/M 


/  (z(k) |M,  Z*"1)  -  1)5(M  -  M,;)dM 

2=1 
Nf 

f  (z(k) |M,  Z*-1)  -  1)4(M  -  MO 


i—  1 


Y>(fc-1)  [  f(z{k)\M,Zk~1)6(M-Mi)dM 

i=i  7m 

Nf 

f  (z(k) |M,  Zfc"1)  -  1)4(M  -  MO 


2=1 


(2.30) 


5>i(*-l)/  (2(fc)|Mi,Z‘-1) 


2=1 


by  the  sifting  property  of  the  delta  function.  To  find  a  particular  mode  probability, 
simply  set  both  sides  of  the  above  equation  to  M  =  Mj  where  Mj  is  the  model  of 
interest.  Applying  this  approach  for  j  =  1, ...  ,Nf  yields  an  expression  for  each  mode 
probability  [4.22,23,33]: 


n(k) 


)/  (*W|m  j.z"-1) 

Nf 


2=1 


j  =  l,...,Nf.  (2.31) 


Equation  (2.31)  indicates  that  the  second  term  in  the  numerator  divided  by  the  de¬ 
nominator  is  always  less  than  one  as  long  as  the  initial  constraints  cited  above  are  met 
(the  /^(0)’s  sum  to  one  and  each  //,■( 0)  is  greater  than  or  equal  to  zero).  Consequently, 
the  sum  of  all  mode  probabilities  is  one  for  any  sample  k. 

The  expression  /  (z(fc)|Mj,  Zfc_1)  is  the  conditional  pdf  of  z{k )  conditioned 
on  the  assumed  mode  and  the  observed  prior  measurement  history.  At  sample  k  the 
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current  measurement  is  observed  (in  fact,  it  is  z*,,  which  is  a  realization  of  the  random 
process  z(k )  at  sample  /c),  and  the  pdf  /  (z(/c)|Mj,  Zfe_1)  becomes  the  likelihood 
function  of  mode  j  given  by  |4l[22] 

=  L(zk-,rkJ,S,(k)) 


exP  [~2rbsj  1(fc)rt.j] 

(2n)^  yAiet  S j(k) 


r k,j  =  zfc  -  HjXjiklk  -  1) 


S  j(k)  =  —  l)Hj  +  Rj(A:).  (2.32) 

In  this  notation  j  indicates  the  model  number,  j  =  1, . . . ,  Nf,  so  that  each  quantity 
above  corresponds  to  one  of  the  Nf  filters.  Also,  m  is  the  dimension  of  the  random 
measurement  vector  z[k)  or,  equivalently,  the  dimension  of  r^j. 

To  complete  the  recursive  Bayesian  estimator  derivation  for  the  case  of  non¬ 
switching  models,  substitute  (2.29)  and  (2.15)  into  (2.27ft, 

Nf 

f(x{k),  M\Zk)  =  f  (®(A:)|M,  Zk)  m(k)5(M  -  Mt) 

2=1 


Nf 


£>(*)/  {x(k)\MhZk) 


2=1 


nJ(z(k)\x(k),Mi,Z‘-1)  „  „„„ 

>  lM\k) - e  — r7k-i\ - N{x[ky Xi[k\k  -  1),  Pi[k\k  -  1)} 


2=1 


/(zMIMi.Z*-1) 


Nf 


^  Hi(k)Af{x(k)]Xi(k\k),  Pi{k\k)}. 


(2.33) 


2=1 
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Equation  (12.331)  has  nice  interpretations  in  the  context  of  the  Kalman  filter 
derivation  in  Subsection  12.2.1  and  the  multivariate  Gaussian  mixture  introduction 
in  Section  12.31  The  last  two  lines  of  (12.331)  indicate  that  the  time  propagation  and 
measurement  update  stages  of  the  non-switching  model  algorithm  are  comprised  of  Nf 
single-model  Kalman  filter  time  propagation  and  measurement  update  stages.  Thus, 
Nf  Kalman  filters  are  needed,  and  each  filter  recursively  operates  under  its  own  state 
random  process  vector  estimates.  The  last  line  of  Equation  (12.331)  is  a  Gaussian 
mixture  since  it  is  a  weighted  sum  of  Gaussian  pdfs  and  the  constraints  on  pi{k)  are 
the  same  constraints  imposed  on  pi.  Each  mixture  component,  which  is  the  weighted 
output  of  a  Kalman  filter  with  a  distinct  system  dynamics  model,  may  be  interpreted 
as  corresponding  to  the  hypothesis  that  model  j  is  correct.  In  this  sense,  the  Bayesian 
solution  for  kinematics  model  uncertainty  evaluates  hypotheses  about  which  model  in 
the  design  best  matches  the  target  kinematics  given  the  observed  measurements. 

Figure  23  depicts  a  block  diagram  of  the  non-switching  model  Bayesian  solution 
inferred  from  Equation  (12.331).  Notice  that  each  filter  operates  recursively  under  its 
own  mean  and  covariance  estimates.  The  overall  mean  and  covariance  of  the  state 
vector  are 

Nf 

x(k\k)  =  y^jii(k)Xi(k\k) 

i= 1 
Nf 

P(k\k)  =  Pi{k)  [Pi(k\k)  +  Xi(k\k)xi{k\k)T^  —  x{k\k)x(k\k)T 
2—1 
Nf 

=  y^jij(k)  [Pi(k\k)  +  (; Xi(k\k )  -  x(k\k)){xi{k\k)  -  x(k\k))T]  (2.34) 

2—1 

where  the  state  random  process  vector  mean  estimate  is  given  by  x{k\k)  and  the  state 
random  process  vector  covariance  estimate  is  P{k\k). 

As  noted,  the  non-switching  model  Bayesian  solution  for  kinematics  model  pa¬ 
rameter  uncertainty  does  not  account  for  the  realistic  possibility  of  target  maneu¬ 
vering.  This  shortfall  is  evident  in  the  recursive  mode  probability  calculation  given 
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Figure  2.4:  A  block  diagram  of  the  non-switching  multiple  model  algorithm.  When 
certain  ad  hoc  modifications  are  made  to  the  non-switching  model  algorithm  to  enable 
model  switching,  then  the  algorithm  is  called  MMAE. 
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in  Equation  (12.31  f) .  If  any  of  the  mode  probabilities  reach  zero,  then  this  mode 
probability  will  remain  zero  even  if  future  measurements  indicate  (through  the  like¬ 
lihood  function  calculation  in  Equation  (12.321))  that  the  corresponding  model  is  the 
best  match  to  the  current  target  dynamics.  However,  by  introducing  an  artificial 
non- zero  modal  probability  lower  bound  into  Equation  (2.311),  this  shortfall  is  over¬ 
come  [41122II231158] .  Consider  a  scenario  in  which  a  target’s  dynamics  over  a  long  period 
of  time  were  best  represented  by  the  first  of  two  models  so  that  the  mode  probability 
of  the  unfavorable  filter  is  at  the  lower  bound.  Now  imagine  that  the  target  assumes 
a  new  trajectory  over  an  extended  period  of  time  which  is  best  described  by  the 
second  model.  Then  the  observed  measurements,  through  the  likelihood  function  in 
Equation  (2.32),  would  indicate  that  the  second  model  is  a  more  favorable  match  to 
the  target’s  present  dynamics.  Since  the  modal  probability  of  the  second  hlter  is  at 
the  lower  bound  and  not  zero,  the  modal  probability  will  increase  over  time  to  favor 
the  hypothesis  that  the  target’s  motion  is  best  described  by  the  second  model.  A 
second  ad  hoc  modification  improves  the  response  time  of  the  algorithm  to  changes 
in  target  dynamics.  Re-initializing  the  estimates  of  divergent  filters  allows  the  modal 
probability  for  unfavorable  filters  to  increase  more  quickly  in  value  if  the  observed 
measurements  indicate  that  this  hlter  is  a  good  match.  If  the  scalar  quadratic  form 
(Chi-square  variable)  [r^JSj1(fc)r k,j]  in  Equation  (2.32)  is  substantially  greater  than 
m,  then  that  elemental  hlter  can  be  declared  divergent,  and  then  be  restarted  with  the 
state  estimate  in  Equation  (2.341)  without  the  divergent  elemental  hlter  contributions 
included  (and  the  mode  probabilities  rescaled  so  that  the  sum  of  these  probabilities 
is  one).  MMAE  implements  modal  probability  lower  bounding  and  divergent  filter 
re-initialization  to  modify  the  non-switching  model  Bayesian  solution  for  kinematics 
model  uncertainty  to  accommodate  maneuvering  targets. 

2fi.2  Switching  Models.  The  switching  model  Bayesian  solution  for  kine¬ 
matics  model  uncertainty  addresses  the  possibility  of  a  maneuvering  target  up-front 
by  mathematically  modeling  the  uncertainty  in  the  appropriate  elements  of  the  state 
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transition,  noise  input,  and  model  noise  process  vector  covariance  matrices  as  random 
processes.  It  was  shown  in  the  previous  subsection  that  allowing  the  model  vector  to 
have  a  continuous  sample  space  would  not  likely  lead  to  a  real-time  implementation, 
so  the  sample  space  of  the  model  random  process  vector,  M (k),  is  restricted  to  a 
finite  discrete  set  for  all  values  of  k  [4].  Since  M(fc)  is  now  discrete- valued, 

let  M(/c)  =  Mjfc.  One  drawback  of  representing  the  model  vector  as  a  random  process 
is  that  the  Bayesian  solution  needs  to  be  approximated  to  obtain  a  computationally 
tractable  algorithm.  The  recursive  Bayesian  solution  in  the  presence  of  kinematics 
model  parameter  uncertainty  is  derived  in  this  subsection,  and  three  approximated 
solutions  are  developed  in  subsequent  subsections. 

As  a  starting  point  in  finding  a  recursive  Bayesian  solution  to  this  problem,  con¬ 
sider  the  joint  conditional  pdf  of  the  unknown  target  state  random  process  vector  and 
the  unknown  model  random  process  vector  f(x(k),'Wl(yk)\Zk).  After  an  application 
of  the  law  of  conditional  probability  for  pdfs,  this  density  becomes 

/  (x(k),  M(k)\Zk)  =  f  (cc(A;)|M(A;),  Zk)  f  ( M{k)\Zk )  (2.35) 

which  is  similar  to  the  joint  conditional  pdf  in  the  non-switching  model  case  (Equation 
(2.27));  that  is  the  purpose  of  writing  out  Equation  (2.35).  However,  it  is  shown  in 
the  remainder  of  this  subsection  that  one  must  consider  the  model  random  process 
vector  at  all  time  instants  through  sample  k  (the  model  history )  and  not  just  M(fc) 
itself  to  evaluate  the  pdfs  on  the  right  hand  side  of  Equation  (12.35)  readily  by  means 
of  a  Kalman  filter.  Since  M (k)  is  a  discrete  random  vector  at  sample  k,  the  second 
pdf  is  written  as 

Nf 

f  (M(k)\Z“)  =  X>(M(fc)  =  M,J Zk)  6(M(k)  -  M,J  (2.36) 

ik= 1 

where  the  subscript  on  the  summation  index  emphasizes  that  it  corresponds  to  the 
discrete  random  vector  M(/c)  at  sample  k.  Since  a  recursive  solution  is  sought  so 
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that  a  Kalman  filter  may  be  applied  to  this  problem,  a  relationship  between  M(/c) 
and  M(/c  —  1),  M (k  —  2), ,  M(l)  is  needed.  By  using  the  fundamental  definitions 
of  discrete  random  processes  and  marginal  pdfs  [20],  this  pdf  may  be  represented  as 
(replacing  M(fc)  =  with  simply  Mife) 

Nf  (  ns  Nf  \ 

f{M(k)\Z“)  =  E  E  . 

^fe=l  i=l  ii=l  J 

Nf  (  Nf  Nf  \ 

=  E  E  'E  -MJ  (2.37) 

ik= 1  l=l  ii=l  J 

which  is,  in  fact,  the  marginal  pdf  of  the  joint  pdf  /(M(/c), . . . ,  M(l)|Z’fc).  The 
summation  notation  ik-i  indicates  that  the  summation  index  is  for  the  discrete- valued 
random  vector  M(/c  —  1)  =  M*fc_1  which  has  a  sample  space  of  {Mjjfh  A  similar 
notational  convention  applies  to  the  other  subscripted  V s.  Since  each  summation 
contains  Nf  terms  and  there  are  k  summations,  determining  f(Nl(k)\Zk)  requires 
evaluating  ( Nf)k  terms.  As  time  increases  (i.e. ,  k  increases),  the  number  of  evaluations 
becomes  unbounded  and  this  solution  becomes  computationally  intractable. 

Still,  the  joint  pmf  p(Mifc,  {M^}*-1|ZA:)  may  be  expanded  using  the  law  of 
conditional  probability  for  pmfs  in  an  attempt  to  reduce  this  expression  to  a  recursive 
form: 


f(z(k),Zk~l) 


(2.38) 


/(zW|M,;1,{M1,}p1.Zt-1)p(M,;J{M„};-1.Zt-1)p({Mit}p1|Zt-1) 

Observe  that  the  measurement  pdf  f(z(k)  (•)  as  well  as  the  model  pmf  p(MjJ-)  in  the 
numerator  of  this  equation  are  conditioned  on  the  entire  model  history, 
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Thus,  a  recursive  solution  would  require  knowledge  of  the  model  history  over  all 
previous  samples. 


However,  these  observations  lead  to  two  possible  remedies  to  the  problem  of  the 
exponential  growth  of  evaluations.  First,  if  is  assumed  to  be  a  Markov  process, 
then  p(Mjfc|{Mj£}^_1,  Zk~l)  becomes  p(Mifc|Mjfc_1,  Zfe_1)  (a  transition  probability 
from  state  to  state  Mjfc)  and  likewise  for  the  other  values  of  Ay2!  By  the  defini¬ 

tion  of  a  Markov  process,  the  transition  probability  p(Mjfc|Mjfc_1,  Zk~l)  solely  depends 
on  the  previous  state,  so  that  the  conditioning  on  Zk~x  may  be  dropped;  however, 
this  conditioning  will  remain  explicit  in  the  notation  for  clarity.  The  second  remedy 
is  to  limit  the  model  histories  for  the  measurement  pdf  above  to  the  current  sample  or 
the  current  and  previous  time  instants.  Combining  the  Markov  process  assumption 
for  and  limiting  the  model  histories  for  the  measurement  pdfs  results  in  the  Gen¬ 
eralized  pseudo-Bayesian  (GPB)  and  Interacting  Multiple  Model  (IMM)  algorithms. 
Further  approximations  are  necessary  to  produce  practical  real-time  algorithms  from 
GPB  and  IMM.  These  algorithms  will  be  derived  in  the  following  subsections. 


2. 4-2.1  Generalized  Pseudo-Bayesian- 1  Algorithm.  The  GPB-1  al¬ 
gorithm  uses  the  Markov  process  assumption  for  and  limits  the  model  history 
conditioning  of  the  measurement  pdf  to  the  current  sample  k  [4] .  It  also  approximates 
the  measurement  history  through  sample  k  —  1  by  the  combined  state  random  process 
vector  mean  and  covariance  estimates  from  the  previous  cycle.  The  algorithm  requires 
Nf  models,  or  filters,  like  the  non-switching  model  algorithm,  but  model  switches  are 
enabled  by  the  initial  uncertainty  modeling  assumption  given  in  Subsection  12.4.21  A 
derivation  of  the  GPB-1  algorithm  is  contained  in  this  subsection. 

The  GPB-1  derivation  begins  by  using  Equations  (12.3511  and  (12.361  in  their 
current  form.  Equation  (12.3811  is  approximated  using  the  Markov  process  assumption, 

12By  the  definition  of  a  Markov  process,  if  x(k)  is  a  discrete  Markov  process  [201(21] .  then 
p(x(k)\x(k  —  1), . . . ,  £c(0))  =  p(x(k)\x(k  —  1)). 
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and  by  discarding  the  conditioning  on  the  previous  model  history  in  the  measurement 
pdf: 


/  (ztAQIMi,,  Zk-V)  p  Zk-‘)  p  (M4_,| Zk-') 

f(z(k)\Zk~l) 


(2.39) 


Substituting  this  expression  into  Equation  (12.361)  yields 


/  (M(k)\Zk)  =  X>  (M(i)  =  M,JZ‘)  -  M,J 


£  «*)  -  Mij  5^ 


ik—  1 —  1 


The  inner  summation  term  is  dehned  as  the  mode  probability  Hik(k)  =  p(Mik\Zk), 


and  the  term  p(Mjfc_1  \Zk  1)  is  tlik_1(k  —  1),  which  leads  to  the  recursion  initially 


sought  [4].  Additionally,  p(Mife|Mjfc_1,  Zk~x)=  is  the  mode  transition  proba¬ 

bility  which  is  simply  the  state  transition  probability  for  discrete  Markov  chains  (this 
is  a  consequence  of  the  Markov  process  approximation)  [4].  The  mode  transition 
probability  is  chosen  by  the  designer  based  on  engineering  insights. 

Explicitly,  the  mode  probabilities  at  sample  k  are  given  by 


(2.41) 
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for  ik  =  1,  •  •  • ,  Nf.  The  term  in  the  denominator  is  a  normalization  factor  and  may 
be  found  by  using  the  property  of  marginal  pdfs/pmfs  as 


/(*(*) \zk~l) 


‘  l{z(k),  M>,.  Zk-V) 

h  1  (z‘T 

^  /  (zW|Mit.  Z*-1)  p  (M.JZ1-1)  /  (Z^1) 

h  !  (z‘_I) 

N, 

X  1  bMIM4,  Zk-V)  p  (M,J Zk~')  .  (2.42) 


Now,  the  term  p  (Mjfc \Zk  x)  is 


P(M,J  Zk-‘) 


4^-1  /  (Z“) 

f  P  Zfc-')  p  (Mi.JZ*"1)  /  (Z^1) 

4^=1  f  (Z‘N 

W; 

XI  r4,4-,P4-i(*:  -  !)•  (2-43) 

ifc-x=l 


The  final  expression  for  the  mode  probabilities  at  sample  k  is  now  seen  as 

Nf 

}(z(k)\Mit,Z1"1)  X  r<<. 4-1P4-. (*-l) 

P4  W  =  ^ ^ -  (2-44) 

X/hW|M4,Z‘-1)  X  P4,i.-,Pi»-1(*-l) 

^/c  =  1  — 1  =  1 

for  ik  —  1, . . . ,  iVj,  which  ensures  that  the  sum  of  all  mode  probabilities  is  one.  The 
term  /  (z(fc)|Mifc,  is  simply  the  likelihood  function  given  in  Equation  (12.321) 

but  with  ik  replacing  j. 
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Substituting  Equation  (12.44))  into  Equations  (12.40)  and  (I2.35D  yields 


f  (x(k),M(k)\Zk)  =  f(x(k)\M(k),Zk)f(M(k)\Zk) 

Nf 

«  /  Or(*0|M(*:),  Zk)  -  M,J 

Nf 

=  (2.45) 

A  final  approximation  is  made  by  letting  x{k  —  l\k  —  1)  and  P(k  —  l\k  —  1)  represent  the 
information  in  the  measurement  history  Zk'~1  so  that  the  overall  mean  and  covariance 
estimates  from  the  previous  cycle  are  propagated  through  each  model;  that  is, 

/  (x(k) |Mifc,  z(k),  Z^1)  «  /  (x(k)\Mik,z(k),  x(k  -  1| k-  1),  P (k  -  l\k  -  1)) . 

(2.46) 

Finally,  the  target  state  pdf,  under  the  GPB-1  assumption  that  is  a  Markov 
process  and  using  the  approximations  (2.39)  and  (2.46).  is 

Nf 

f  (x(k),  M(fc)|Z‘)  «  ]T  lJ,,t(k)f  z(k),  x(k  -  l\k  -  1),  P(*  -  \\k  -  1)) 

Nf 

=  Y.  Hik{k)J\f{x(k)-,x(k\k),  P(fc|fc)}  (2.47) 

with  an  overall  mean  and  covariance  given  by 

Nf 

x{k\k)  =  ^2  fiik(k)xik(k\k) 

^/c  ~  1 

Nf 

P(k\k)  =  J2  Vik(k)[pik(k\k)  +  xik{k\k)xik(k\k)T]  -  x(k\k)x(k\k)T 

Nf 

=  A%0)  [pik(k \k)  +  (£ik(k\k)  -  &(k\k))(&ik(k\k)  -  &(k\k))T]  •  (2-48) 

*fc  =  l 
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Equation  (12.47)  looks  very  similar  to  Equation  (12.331)  in  that  it  represents  a 
Gaussian  mixture,  but  unlike  that  equation,  the  GPB-1  state  vector  pdf  shows  that 
the  Nf  models  recursively  operate  on  the  combined  state  random  process  vector  mean 
and  covariance  estimates.  This  fact  is  evident  in  the  parameters  of  the  Gaussian  pdf 
in  these  two  equations.  For  the  non-switching  model  state  vector  pdf,  the  parameters 
of  the  Gaussian  pdf  inside  the  summation  are  Xi(k\k)  and  Pj(fc|fc)  where  the  subscript 
i  indicates  that  these  values  are  from  the  ith  filter.  Thus  the  non-switching  algorithm 
requires  each  filter  to  operate  recursively  on  its  own  estimates.  In  contrast,  the 
Gaussian  pdf  parameters  in  (12.47)  are  x(k\k)  and  ~P(k\k)  (the  combined  mean  and 
covariance  estimates  of  the  target  state  vector),  which  show  that  each  filter  recursively 
operates  according  to  the  combined  estimates  of  all  filters.  Additionally,  Equation 
(12.471)  includes  the  possibility  of  a  model  switch  at  any  given  time.  That  is,  if  the  target 
changes  from  model  M3  at  time  k  —  1  to  model  M5  at  time  k,  then  this  model  switch  is 
characterized  by  the  model  transition  probabilities,  Tikiik_x,  contained  in  /aik(k).  One 
potential  drawback  of  GPB-1,  and  of  any  of  the  switching  model  algorithms  (GPB-1, 
GPB-2,  and  IMM),  is  that  the  model  transition  probabilities,  Tik:ik_1,  must  be  known. 
If  these  probabilities  are  not  provided  to  a  designer,  then  the  designer  must  make  an  ad 
hoc  assignment  to  their  values.  This  last  point  demonstrates  that  ad  hoc  adjustments 
may  be  necessary  for  a  practical  implementation  of  the  switching  model  algorithms, 
and  that  a  practical  Bayesian  solution  may  require  ad  hoc  modifications  regardless  of 
the  initial  assumption  about  the  nature  of  the  model  vector  (i.e.,  whether  the  model 
vector  is  represented  as  a  random  vector  or  a  random  process  vector). 

Figure [2T51  is  a  graphical  representation  of  Equation  (2.47).  As  previously  noted, 
each  filter  runs  under  the  combined  target  state  vector  estimates.  Also  note  that  the 
subscripts  on  the  mean  and  covariance  estimates  of  the  state  random  process  vector 
are  the  index  values  of  the  summation  in  Equation  (2.47). 
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x(k-l\k-\ 


Figure  2.5:  A  block  diagram  of 
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hd  8 


GPB-1  algorithm. 


24-2.2  Generalized  Pseudo-Bay esian-2  Algorithm.  Like  GPB-1,  the 
GPB-2  algorithm  uses  the  Markov  process  assumption  for  M^,  but  now  the  mea¬ 
surement  pdf  is  conditioned  on  the  model  vector  at  sample  k  —  1  in  addition  to  the 
conditioning  on  the  current  model  vector  at  sample  k  [4].  Since  the  measurement 
pdf  conditioning  includes  the  model  vector  at  samples  k  and  k  —  1,  the  algorithm 
requires  Nj  Liters  to  operate.  GPB-2  approximates  the  measurement  and  model  his¬ 
tories  through  sample  k  —  1  by  the  weighted  sum  of  the  state  random  process  vector 
mean  and  covariance  estimates  from  the  first  set  of  filters  (which  will  become  clear 
when  Equation  (12.53)  and  Figure [2761  are  introduced).  GPB-2  typically  outperforms 
GPB-1,  but  at  the  expense  of  using  Nj  Liters  as  opposed  to  only  Nf  Liters  [4]. 

One  assumption  and  two  approximations  to  the  switching  model  derivation  are 
used  to  develop  the  GPB-2  algorithm: 

L  Mlk  is  assumed  to  be  a  Markov  process. 

2.  The  conditional  model  pmf  conditioned  on  the  measurement  history  is  approx¬ 
imated  by  conditioning  the  measurement  pdf  on  the  model  vectors  and 
while  discarding  the  model  history  at  previous  samples.  Combining  this 
approximation  and  condition  1  results  in  condition  2: 


p(Mit,{M,;,}y1|Z‘) 


k- 1 


f(z(k)  IZ1-1) 


p(MiJMil_„Z‘-1)P(Mil_1|Z*-1) 


In  comparison,  the  corresponding  GPB-1  approximation  in  Equation  (2.39)  only 
conditions  the  measurement  pdf  on  the  current  model  vector  Mifc. 

3.  The  measurement  and  target  state  pdfs  are  approximated  by  letting  the  state 
random  process  vector  mean  and  covariance  estimates  from  the  previous  sample 
represent  the  information  contained  in  and  Zk~l .  That  is, 
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/(z(i)|M,„M„_1,Zt-1)  «  l(z(k)\M,t,iit_t(k-  l\k-  l),Pik_,(k-  l|fc-  1)) 

f(x(k)\z(k),Mit,xit_t(k  -  l|fc  -  1),  Pi„_1(A:  —  l|fc  —  1)) 

As  before,  a  recursive  Bayesian  solution  is  desired,  so  (l2.35j)  is  used,  but  now 
M (k  —  1)  is  included  in  the  target  state  pdf: 

f(x(k),M(k),M(k-l)\Zk)  = 

f(x(k)\M(k),M(k  -  1),  Zk)f(M(k  -  l)|M(fc),  Zk)f(M(k)\Zk).  (2.49) 

Since  M(fc)  and  M(fc  —  1)  are  discrete-valued  random  vectors,  Equation  (2.36)  may 
be  used  in  conjunction  with  the  Markov  assumption  for  the  model  random  process 
vector  (condition  1)  to  write  the  above  equation  as 

f(x(k),M(k),M(k-l)\Zk) 

Nf  Nf 

=  E  E  /  pmiM,,,  Z‘)  p  (Mil_1 1  Mi,,  Zk)  p  (MJZ*) 

ik  —  1  =1 

Nf  Nf 

=  E  E  /PW|M4.M4_„Zt)p(M.l_,|M.„ZVilW 

ik  =  1  ^k  — 1  =  1 

Nf  Nf 

=  X>.®  E  /(*(fe)|Mj|,.Mi|,_1,Z'‘)p(M4_1|M(l,Zl)  (2.50) 

ik= 1  ^k— i=l 

where  p(Mjfc_1|Mifc,  Zk )  represents  the  merging  probabilities  and  fak(k)  =  p(Mife \Zk) 
is  the  mode  probability  [4].  Using  conditions  1,  2,  and  3,  the  merging  probabilities 
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are  given  by  (the  overset  number  above  the  approximation  symbols  indicates  which 
of  the  three  conditions  are  used) 


P  (Mlk_,\Mik,  Zk)  = 


(2.51) 


/  (z(fc)|M<t,  {Mi( Ip1,  Z^)  p  (M.JtM,,};-1,  Z"-1)  p  ({M,,}*-1^"-1) 

/(*(*)  |z‘_1) 


1 

r>s> 


f(z(k)  IZ1"1) 


/PW|Z‘"‘) 


IV,  IV, 

~  1  —  1 =  1 


£  /  («(A:)|Mifc,  (A:  -  1| k  -  1),  P^fc  -  1| k  -  1))  (k  -  1) 

Nf  Nf 

Y1  f  (zWIM4-4-i(A  -  1|&  -  1),  Pi^^A:  -  1 1 Ac  -  1))  Tik>ik  liXik  l{k  -  1) 

ik— l=l 

where  is  the  mode  transition  probability  given  by  p(MijMjfc_  _i;  Zk  *)  (as  in 

Equation  (2.40))  and  nik  l(k  —  1),  as  given  in  Equation  (2.41).  is  the  previous  sample 
mode  probability,  p(Wlik_l  \Zk~1).  When  the  measurement  is  available,  the  measure¬ 
ment  pdf  of  this  equation,  f(z(k)  |Mifc,  xik  l(k  —  1| k  —  1),  P ik_1(k  —  1| k  —  1)),  becomes 
the  likelihood  function  (2.32)  for  each  of  the  i}-  models  with  the  appropriate  changes 
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to  the  parameters.  In  a  similar  fashion,  the  current  mode  probabilities  are  given  by 


/**(*)  =  P  (Mlfe|Zfc)  (2.52) 


Nf 

Y  f  iz(k)\Mik^ik-i(k  ~  l\k  ~  !)>  Pik-i(k  ~  l\k  ~  !))  rik,ik-iVik-Ak  ~  !) 

3  ik— l=  1 
~  ~f  N} 

ik= 1  ^k— l=l 

Finally,  after  applying  condition  3,  the  state  random  process  vector  pdf  is 

Nf 


Nf 

E  /pWkW.Mil,4il_1(fe-l|4-l),Pjl_1(4-l|4-l))p(Mil_1|Mil.Z‘) 

^fc  —  1  =1 


JV>  TV/ 

=  X>*M  E  (2.53) 

1  =  1 


The  inner  summation  is  a  Gaussian  mixture  with  mean  and  covariance  given  by 
Equation  (2.25).  Once  the  inner  summation  is  evaluated,  then  the  outer  summation 
produces  another  Gaussian  mixture.  That  is,  first  compute 


xik(k\k) 


prMk) 


Nf 

*k- 1=1 


Nf 

E  P( 


(*»fc_i(A:|A:)  -  ^4(/c|/c))(^_1(/c|fc)  -  xik{k\k))T 


(2.54) 
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then  calculate  the  GPB-2  overall  mean  and  covariance  estimates  as 
Nf 

x(k\k)  =  Y  tiih(k)xih{k\k) 

Nf 

P(k\k)  =  Y  Vik(k)[pik(k\k)  +  (*»*(*#)  ~  x(k\k))(xik(k\k)  -x(k\k))T].  (2.55) 

Figure  12T61  depicts  a  block  diagram  of  the  GPB-2  algorithm.  At  the  beginning 
of  a  processing  cycle,  each  of  the  merged  estimates  are  input  into  Nf  filters  and 
the  same  measurements  are  fed  to  each  filter.  The  superscripts  on  the  mean  and 
covariance  estimates  at  the  outputs  of  the  Nf  filters  correspond  to  the  indices  of  the 
inner  summation  in  Equation  (12.53).  Once  the  inner  summation  is  calculated  for 
each  ik  of  the  outer  summation,  the  estimates  are  merged  after  being  scaled  by  the 
merging  probabilities  p(Mifc_1  |M,fc ,  Zk).  Finally,  the  merged  estimates  are  scaled  by 
the  mode  probabilities  Hik{k )  and  combined  via  Equation  (12.55)  into  the  overall  mean 
and  covariance  estimate  of  the  state  random  process  vector. 
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Figure  2.6:  A  notional  block  diagram  of  the  GPB-2  algorithm. 
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2. 4- 2. 3  Interacting  Multiple  Model  Algorithm.  The  IMM  algorithm 
achieves  the  performance  of  GPB-2  but  uses  only  the  same  number  of  filters  as 
GPB-1  [4|S8].  For  these  reasons  IMM  is  the  preferred  approximation  to  the  switching 
multiple  model  approach. 


In  [4],  IMM  is  derived  starting  from  the  GPB-1  assumptions/approximations 
and  later  incorporating  condition  3  from  the  GPB-2  subsection.  This  approach  results 
in  an  algorithm  in  which  the  the  merging  probabilities  scale  the  estimate  of  the  state 
random  process  vector  from  the  previous  cycle  before  beginning  the  next  cycle  of  the 

],  the  derivation  of  IMM  incorporates  scaling  by  the  merging 


algorithm.  In  [38]  and 
probabilities  after  the  end  of  each  cyclcyy.  This  form  of  the  derivation  emphasizes  that 
the  IMM  algorithm  reduces  to  the  non-switching  algorithm  if  estimate  merging  does 
not  take  place  (skipping  ahead  to  Figure  12.71  if  one  removes  the  “Model  Estimate 
Merging”  block  and  feeds  the  previous  estimates  into  the  appropriate  filters,  then 
the  IMM  block  diagram  is  essentially  the  same  as  the  non-switching  model  block 
diagram  in  Figure  12.4th  Mathematically,  it  can  be  shown  that  IMM  reduces  to  the 
non-switching  model  solution  when  p(Mjfc_1  |Mife,  Zk )  is  replaced  by  a  Kronecker  delta 


function,  Sikik  l,  for  each  pair  of  index  values  [33] . 


This  subsection  includes  a  third  form  of  the  IMM  algorithm  derivation  which 
emphasizes  two  points  not  highlighted  in  [4][33][38].  First,  IMM  may  be  viewed  as  a 
Gaussian  mixture  reduction  approximation  of  GPB-2  to  decrease  the  number  of  filters 
from  Nj  to  Nf.  The  second  point  is  more  subtle  than  the  first.  The  claim  that  IMM 
provides  performance  similar  to  GPB-2  is  evident  when  the  IMM  algorithm  is  derived 
directly  from  GPB-2.  Therefore,  the  third  derivation  of  the  IMM  algorithm  begins 
with  GPB-2. 


The  inner  summation  of  the  GPB-2  target  state  pdf  in  Equation  (12.531)  is  a 
Gaussian  mixture.  If  the  mixture  is  approximated  by  a  single  Gaussian  pdf  with  the 
same  overall  mean  and  covariance  as  the  original  Gaussian  mixture,  then  the  num- 


13  Whet  her  merging  occurs  at  the  beginning  or  the  end  of  a  process  cycle  is  irrelevant.  Both 
methods  are  theoretically  equivalent. 
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ber  of  filters  is  reduced  from  Nj  to  Nf  (only  the  outer  summation  remains).  In  this 
way,  IMM  may  be  derived  from  GPB-2  by  including  this  Gaussian  mixture  reduc¬ 
tion  approximation  to  the  assumption/approximation  conditions  listed  in  Subsection 
12.4.2.21 

Starting  with  the  GPB-2  state  random  process  vector  pdf  given  in  Equation 
(2 .53).  the  IMM  target  state  pdf  is 

/(*(A:),M(fc),M(A:^l)|Zfc) 

=  f(x(k)\M(k),  M(k  -  1),  Zk)f(M(k  -  l)|M(fc),  Zk)f(M(k)\Zk) 

Nf  Nf 

^k  — 

Nf 

IAiM  p{Mik\Zk)N{x(k);xik(k\k),Pik(k\k )}.  (2.56) 

The  overset  text  on  the  first  approximation,  “GPB-2,”  indicates  that  the  assump¬ 
tion/approximation  conditions  in  Subsection  12.4.2.2  are  invoked  in  the  approxima¬ 
tion.  Likewise,  the  overset  text  on  the  second  approximation  indicates  that  the  IMM 
single  Gaussian  pdf  mixture  reduction  approximation  is  used.  The  IMM  target  state 
pdf  appears  to  be  the  same  as  that  for  GPB-1  in  Equation  (2.47),  but  it  fundamen¬ 
tally  differs  from  the  target  state  pdf  of  GPB-1  since  the  merging  probability  terms, 
p(Mlfc_1|M4fc,Zfe),  are  embedded  in  N{x[k)\ xik (k\k),  Pik(k\k)}.  Thus  the  perfor¬ 
mance  of  IMM  is  expected  to  be  closer  to  that  of  GPB-2  than  that  of  GPB-1.  The 
parameters  of  the  pdf,  xik(k\k)  and  Pjfc (k\k),  are  given  by  Equation  (2.54).  However, 
the  equation  for  p(Mjfe |Mifc,  Zk)  is  calculated  in  a  different  manner  than  seen  in 
Equation  (2.51). 
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The  Bayes  expansion  of  the  merging  probability  p(Mjfc_1  |Mjfc,  Zk )  differs  from 
that  of  GPB-2  by  conditioning  on  the  current  measurement  history  (through  sample  k) 
instead  of  the  previous  measurement  history  (through  sample  k— 1).  This  modification 
enables  merging  of  the  previous  cycle  estimates  prior  to  beginning  the  next  cycle.  To 
see  this  point,  consider  the  expansion  of  p(Mifc_1  |Mifc,  Zk)  using  Bayes’  rule  and 
marginal  probability: 


p(Mjfc_1  Zk) 


p(MJM4_„Zt)p(Mi._,|Zt) 

P(M,JZ‘) 

N, 

E  p(M.JM.l_11Z‘)p(M(l_,|Z‘) 

ik- 1=1 

N, 

E  W,p( Mi.-JZ*-1) 

ik- 1=1 

Tiktik-lk'ik-i  ~  jO 

Nf 

Tik,ik-lk'ik-i(k  ~  1) 

ik- 1=1 


(2.57) 


Two  insights  were  used  to  obtain  the  last  line  of  this  equation.  First,  p(Mjfc  l|Zfc)  is 
equivalent  to  p(Mjfe_1  \Zk^1)  since  the  measurement  at  sample  k  has  no  impact  on  the 
model  vector  at  sample  k  —  1.  Second,  p(Mjfc|Mjfc_1,  Zk )  is  represented  by  the  model 
transition  probability,  Tiktik_1.  It  seems  counterintuitive  that  the  probability  of  the 
current  model  would  not  depend  on  the  current  measurement.  However,  the  Markov 
process  assumption  for  Mjfc  imposes  the  condition  that  the  transition  probability  is 
only  dependent  on  the  previous  model  state.  Therefore,  the  conditioning  on  Zk  in 
p(MjjMjfc_1,  Zk)  is  irrelevant  under  the  Markov  process  assumption. 

All  that  remains  of  the  IMM  algorithm  derivation  is  to  determine  the  mode 
probability,  Hik(k)  =  p(M.ik\Zk),  and  the  likelihood  function,  f(z(k) |Mjfe,  The 
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mode  probability  is 


P.,M  =  p(MiJZ(i),Zt-1) 


!(z(k)\mik,zt-i)P(mit\zk-v) 

f{z(k)\zk-') 

N, 

4-1  =  1 

Nf 

!(z(k)\Mit,Zk~1)  V  p(MiJMil_1,Z‘-1)p(Mil_1|Z‘-1) 

4-i=l 

iv> 

£/(z(A0|Mil,Z‘-1)  XI  p(M,JM„_„Z‘-1)p(M„_1|Zt-1) 

^/c  ~  1  —  1=1 


Nf 

}(z(k)\Mil,,Zt-1)  V  rilA_,p4_,(i-l) 
4-i=l 

"w7  iv/ 

^/(zWIM^.Z^1)  V  rikih_lfjlili_1(k  —  1) 

^fe  =  l  —  1=1 


(2.58) 


(note  that  the  mode  probability  for  IMM  is  the  same  as  that  for  GPB-1  in  Equation 
(2 .44)).  Using  Equation  (2.32th  the  likelihood  function  of  f(z(k)\’Mik,  Zk~l)  given 
the  realized  observation  z(k)  =  zk  is 


/(^(fc)|M,tfc,Zfc-1)|z(fc)=Zfc  =  L(zk;vik,k,Sik(k)) 

exp  [~|r^fcS Tk(k)rik}k] 

(27t)t y/ det  S ik(k) 
r ik,k  =  zk  -  Uikxik(k\k  -  1) 

S  ih(k)  =  UlkP.lk(k\k-l)Ul+Rlk(k).  (2.59) 
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Figure  2.7:  A  block  diagram  of  the  IMM  algorithm. 


Figure  12771  represents  the  IMM  algorithm.  At  the  beginning  of  each  processing 
cycle,  the  Nf  estimates  of  the  last  processing  cycle  are  merged  according  to  the 
merging  probabilities  of  each  model  (filter).  Merging  occurs  in  the  “Model  Estimate 
Merging”  block  which  functions  as  a  mixture  reduction  algorithm,  reducing  the  Nf- 
order  Gaussian  mixture  to  a  single  Gaussian  pdf.  The  Elters  then  operate  on  the 
merged  estimates  (note  that  the  subscripts  on  the  mean  and  covariance  estimates  of 
the  state  random  process  vector  change  from  ik_  1  to  ik,  which  corresponds  to  Equation 
(12.56)).  After  being  scaled  by  the  mode  probability  for  each  model,  the  outputs  of 
the  filters  are  combined  into  the  overall  estimates  by  the  summation  block. 


2-4-3  Multiple  Model  Algorithms  Summary.  This  section  introduced  the 
realistic  problem  of  kinematics  model  parameter  uncertainty  encountered  in  target 
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tracking.  This  uncertainty  may  be  represented  by  the  unknown  elements  of  the 
&(k,k-  1),  Gd,  and  Qd  matrices,  which  are  mathematically  represented  as  random 
quantities  and  inserted  into  a  parameter  vector  to  be  estimated.  Since  the  model 
parameters  are  random  quantities,  Bayes  estimation  can  be  used  to  find  a  recursive 
Bayesian  solution.  It  was  quickly  determined  that  the  model  parameters  had  to  be 
limited  to  a  discrete  set  of  values  to  provide  any  hope  of  a  real-time  implementation. 
After  limiting  the  continuous  set  of  possible  values  to  a  finite  discrete  set,  two  fun¬ 
damental  assumptions  about  the  random  nature  of  the  model  parameters  were  made, 
leading  to  two  Bayesian  solutions.  The  first  solution  assumed  that  the  model  param¬ 
eters  were  time-invariant  so  that  they  were  represented  by  a  random  vector  M.  This 
assumption  led  to  the  non-switching  model  recursive  Bayesian  solution.  The  funda¬ 
mental  drawback  to  representing  the  model  parameters  as  random  constants  is  that 
the  resulting  Bayesian  solution  presumes  that  the  target  travels  according  to  only  one 
model  for  all  time.  The  second  solution  assumed  that  the  model  parameters  were 
time-varying,  and  the  model  parameters  were  represented  by  a  random  process  vector 
M (k).  This  assumption  led  to  the  switching  model  recursive  Bayesian  solution.  The 
advantage  of  this  method  is  that  the  target  is  not  assumed  to  travel  according  to 
only  one  model  for  all  time.  Both  the  rigorous  non-switching  and  switching  model 
recursive  Bayesian  solutions  are  unsuitable  for  practical  implementation  in  a  target 
tracking  system. 

Ad  hoc  modifications  and  approximations  to  the  rigorous  non-switching  and 
switching  model  solutions,  respectively,  led  to  practical  implementations  for  target 
tracking  systems.  MMAE,  which  is  based  on  the  non-switching  model  recursive 
Bayesian  solution,  utilizes  ad  hoc  modal  probability  lower  bounding  to  enable  model 
switching.  Additionally,  filter  re-initialization  is  used  to  improve  the  response  of 
the  algorithm  to  changes  in  target  dynamics.  With  these  modifications  to  the  non¬ 
switching  model  solution  in  place,  MMAE  is  a  suitable  multiple  model  algorithm  for 
use  in  real-time  tracking  of  a  maneuvering  target.  GPB-1,  GPB-2,  and  IMM  incor¬ 
porate  various  approximations  to  the  rigorous  switching  model  Bayesian  solution  to 
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produce  practical  algorithms  which  inherently  allow  for  target  maneuvering.  Of  the 
three  approximations,  IMM  provides  performance  approaching  that  of  GPB-2  (which 
generally  provides  the  best  performance  of  the  group),  but  with  a  computational  bur¬ 
den  on  par  with  that  of  GPB-1  (and  MMAE).  In  fact,  IMM  reduces  to  MMAE  when 


the  model  transition  probabilities  are  represented  by  Kronecker  delta  functions  [33 


2.5  A  Bayesian  Approach  for  Measurement  Origin  Uncertainty 

This  section  introduces  another  practical  source  of  uncertainty:  the  origin  of 
measurements.  One  aspect  of  the  data  association  problem  is  depicted  in  Figure  H! 
in  which  fourteen  measurements  are  observed  when  only  two  targets  are  known  to 
exist.  Assuming  that  each  target  cannot  produce  more  than  one  return,  and  that 
feature  information  about  each  of  the  measurements  (e.g.,  amplitude,  phase,  etc.)  is 
not  available,  which  two  of  the  fourteen  observed  measurements  originated  from  the 
two  targets?  This  scenario  is  the  typical  data  association  problem  in  which  the  origin 
of  all  measurements  generated  by  sensors  in  a  scan  must  be  determined.  Another 
aspect  of  the  data  association  problem  allows  for  the  possibility  that,  for  example,  all 
fourteen  observed  measurements  were  generated  by  fourteen  new  targets  and  the  two 
existing  targets  were  not  detected. 

Measurements  may  be  broadly  categorized  into  true  measurements  or  false- 
origin  measurements.  True  measurements  include  those  belonging  to  hypothesized 
targets  that  were  hypothesized  in  previous  scans  and  are  used  for  track  continuation, 
or  those  belonging  to  potential  new  targets  so  that  new  tracks  are  initiated.  A  track 
is  a  state  vector  trajectory  estimated  from  a  set  of  measurements  that  have  been 
associated  with  the  same  target  over  some  number  of  scans  [5].  False-origin  mea¬ 
surements  may  arise  from  clutter,  countermeasures,  or  false  alarms.  Clutter  may  be 
considered  objects  other  than  targets  that  create  spurious  returns.  Countermeasures 
include  decoys  and  jamming.  False  alarms  are  erroneous  measurements  caused  by 
random  sensor  or  environmental  noise.  The  possibility  of  missed  measurements  also 
exists,  which  may  occur  when  tracking  low  observable  targets,  for  instance. 
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Figure  2.8:  An  illustration  of  one  aspect  of  the  data  association  problem.  Given 

more  observed  measurements  than  targets,  how  does  one  update  the  target  state 
random  process  vector  estimates  (adapted  from  [38])?  The  pictures  of  the  F/A-22 
and  S-37  represent  the  true  location  of  the  targets. 


Data  association  requires  generating  hypotheses  about  the  origin  of  the  mea¬ 
surements.  These  hypotheses  are  represented  by  association  events  which  are  formed 
by  labeling  the  measurements  according  to  one  of  two  underlying  assumptions  used  in 
practice.  The  first  assumption  is  that  the  true  number  of  targets  is  known  exactly.  In 
this  case,  one  would  associate  measurements  to  known  tracks  or  to  false  sources  (clut¬ 
ter,  countermeasures,  or  false  alarms).  Under  this  assumption,  the  tracker  operates 
according  to  a  target- oriented  data  association  method  [5].  The  second  underlying 
assumption,  which  seems  more  widely  applicable,  is  that  the  true  number  of  targets 
is  not  known.  Under  this  assumption,  measurements  may  be  associated  with  existing 
tracks  that  were  hypothesized  in  previous  scans,  potential  new  tracks,  or  false  sources. 


This  association  method  is  called  the  measurement- oriented  approach  [27 


A  commonly  used  method  to  reduce  the  number  of  association  events  is  to  place 
a  measurement  gate  around  the  predicted  measurement,  Zj(k\k  —  1),  for  each  existing 
target  j  that  was  hypothesized  in  a  previous  scan.  Measurements  that  are  within 
a  measurement  gate  of  an  existing  target  at  the  current  scan  are  hypothesized  as 
potentially  originating  from  that  target,  while  measurements  outside  of  the  gate  are 
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Figure  2.9:  By  using  measurement  gates  based  on  the  predicted  measurement, 

Zj(k\k  —  1),  for  each  existing  target  j  that  was  hypothesized  in  a  previous  scan,  the 


number  of  potential  association  events  is  decreased  (adapted  from  |38]). 


not  considered  as  candidates  for  association  to  that  target.  Instead,  measurements 
outside  all  of  the  measurement  gates  are  hypothesized  as  false-origin  measurements 
when  the  target-oriented  data  association  method  is  used,  or  as  false-origin  or  poten¬ 
tial  new  target  measurements  when  using  the  measurement-oriented  data  association 
approach.  Figure  [23]  shows  the  predicted  measurements  for  the  existing  targets  that 
were  hypothesized  in  a  previous  scan  as  a  dot  and  a  square,  and  the  true  target  loca¬ 
tions  are  indicated  by  the  F/A-22  and  S-37  images.  Although  there  are  several  types 
of  measurement  gates  [7],  Figure ET91  depicts  two  elliptical  measurement  gates  centered 
about  the  predicted  measurement  for  each  target.  The  size  of  each  gate  is  related  to 
Sj(/c),  which  is  the  covariance  of  the  residual  for  each  target  j  (see  Subsection  12.2.11). 
and  it  may  be  specified  in  terms  of  the  probability  that  the  true  target  measurement 
falls  within  the  gate,  Pg  [5].  Measurements  outside  the  union  of  the  measurement 
gates  are  considered  too  unlikely  to  have  originated  from  the  targets  and,  as  a  result, 
are  hypothesized  to  have  originated  from  other  sources. 

The  union  of  all  association  events  formed  from  labeling  all  of  the  measurements 
at  sample  k  creates  the  discrete  sample  space  of  the  association  event  discrete  random 
process  vector  @(/c)  =  @jfc.  Thus,  as  in  the  switching  model  case  of  the  previous 
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section,  the  uncertainty  about  the  true  origin  of  the  measurements  is  modeled  by  a 
discrete  random  process  vector,  and  the  mathematics  of  that  section  also  apply  to 
this  section.  The  number  of  measurements  received  at  each  time  instant  is  random 
in  nature  and,  consequently,  so  is  the  size  of  the  discrete  sample  space  of  @ife.  As  a 
result,  the  number  of  association  events  is  also  random. 

As  an  illustration  of  generating  association  events  at  some  sample  k,  consider  the 
following  example.  A  confirmed  track  is  to  be  updated  by  two  measurements  at  sample 
k.  Furthermore,  to  simplify  the  example,  measurement  gating  is  not  used,  so  either 
measurement  may  be  associated  with  the  confirmed  track.  Using  the  measurement- 
oriented  data  association  method,  the  origin  of  each  measurement  may  be  the  existing 
track  that  was  hypothesized  in  a  previous  scan,  a  potential  new  track,  or  a  false  source. 
Assume  that  the  measurements  originate  from  at  most  one  source  and  any,  all,  or 
none  of  the  measurements  may  originate  from  potential  new  targets  or  false  sources. 
Let  H ft:  Hdt i  and  H^t  represent  the  hypotheses  that  one  of  the  measurements 
originated  from  a  false  source,  the  existing  track  hypothesized  in  a  previous  scan  ( “D” 
is  for  “detected”),  or  a  potential  new  track,  respectively.  Also,  let  z^i  and  denote 
the  two  observed  measurements.  Under  these  conditions,  one  may  generate  all  of  the 
feasible  association  events  by  creating  tables  in  which  each  column  corresponds  to 
a  distinct  hypothesis  (HFT,  HDT ,  or  HNT ),  each  row  corresponds  to  a  measurement 
(zk,i  or  Zfc^),  and  each  element  contains  a  “1”  or  a  “0”  depending  on  whether  a 
hypothesis/measurement  pair  is  considered  “true”  or  “false,”  respectively. 

Figure  2.10  depicts  tables  containing  all  feasible  association  events  for  the  ex¬ 
ample  given  above.  Since  there  is  only  one  confirmed  track  and  two  measurements 
at  sample  k,  the  number  of  potential  targets  is  three  using  the  measurement-oriented 
data  association  method  (the  two  measurements  may  be  from  two  new  targets).  A 
“1”  in  the  cell  corresponding  to  hypothesis  H *  and  measurement  zfc_*  indicates  that 
the  hypothesis/measurement  pair  is  true,  while  a  “0”  means  that  the  hypothesis/mea¬ 
surement  pair  is  not  true.  For  instance,  the  third  table  in  the  first  row  shows  that 
measurement  zfejl  is  associated  with  hypothesis  HFT,  a  false  source,  and  measurement 
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Figure  2.10:  Generated  feasible  association  events  at  sample  k  for  two  measure¬ 

ments  and  one  confirmed  track.  This  representation  of  the  discrete  sample  space  of 
©ife  contains  eight  association  events  or  hypotheses. 
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z,k:2  is  associated  with  the  confirmed  target  hypothesized  in  a  previous  scan  Hot-  The 
true  association  event  is  assumed  to  be  included  in  the  sample  space  of  the  association 
event  random  process  vector,  0 .  Note  that  each  row  of  every  table  must  sum  to  one 
since  a  measurement  cannot  originate  from  more  than  one  source.  Also,  the  entries 
in  the  columns  corresponding  to  hypotheses  Hot  and  H^t  may  sum  to  a  number 
greater  than  one  since  any,  all,  or  none  of  the  measurements  may  have  originated 
from  false  sources  or  potential  new  targets;  the  entries  in  the  columns  corresponding 
to  Hot  may  only  sum  to  zero  or  one,  since  the  actual  target  (pre-existing  track)  is 
assumed  to  generate  at  most  one  measurement. 

The  remainder  of  this  section  is  dedicated  to  finding  a  rigorous  Bayesian  solu¬ 
tion  for  the  measurement  origin  uncertainty  problem  encountered  in  target  tracking. 
The  measurement-oriented  data  association  method  is  chosen  over  the  target-oriented 
data  association  method  because  it  is  the  more  general  of  the  two  methods,  and  the 
target-oriented  method  is  readily  obtained  from  the  measurement-oriented  approach. 
Because  measurement  origin  uncertainty  is  modeled  in  the  same  way  as  the  kinematics 
model  parameter  uncertainty  for  switching  models,  the  recursive  Bayesian  solution  is 
mathematically  similar  to  the  solution  found  in  Subsection  2.4.2.  Consequently,  the 
recursive  Bayesian  solution  suffers  from  the  same  computational  difficulties  as  the 
switching  model  solution.  Specifically,  the  number  of  association  events,  or  hypothe¬ 
ses,  grows  exponentially  over  time,  and  some  type  of  hypothesis  reduction  routine  is 
necessary  if  one  wishes  to  implement  this  solution  in  a  practical  tracking  system. 


Approximations  of  the  recursive  Bayesian  solution  for  the  measurement  uncer 


tainty  problem  may  be  divided  into  two  categories.  The  first  classification  includes 
the  Probabilistic  Data  Association  Filter  (PDAF)  (for  single  target  scenarios),  Joint 
Probabilistic  Data  Association  Filter  (JPDAF)  and  its  variants  (for  multiple  target 


scenarios),  and  N-Scan  filter  with  N  set  to  one 


PDAF  and  JPDAF  approximate 


14Measurement  gating  is  another  type  of  approximation  to  the  rigorous  Bayesian  solution.  This  ap¬ 
proximation  reduces  the  number  of  terms  necessary  for  computing  the  target  state  pdf  by  effectively 
limiting  the  sample  space  of  ©ifc  through  the  restrictions  imposed  by  the  measurement  gates. 
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the  rigorous  solution  in  the  same  manner  as  the  GPB-1  approximation  in  which  the 
target  state  Gaussian  mixture  pdf  is  approximated  by  a  single  Gaussian  pdf  at  the 
end  of  each  scan  cycle  [5] .  The  N-scan  filter,  with  N  set  to  one,  uses  an  approximation 
similar  to  that  used  for  the  GPB-2  algorithm  [5].  These  three  algorithms  rely  on  the 
target-oriented  data  association  method,  so  the  number  of  targets  is  assumed  known 
a  priori.  The  second  category  includes  the  Joining  and  Clustering  [30, [31]  and  Inte¬ 
gral  Square  Error  [381 1401 BI]  mixture  reduction  algorithms,  which  are  approximations 
that  may  use  either  of  the  two  data  association  methods.  Both  algorithms  reduce  the 
number  of  mixture  components  in  the  target  state  Gaussian  mixture  pdf  at  the  end  of 
each  scan  based  on  preset  criteria  and,  in  most  cases,  provide  a  better  approximation 
to  the  target  state  pdf  than  a  single  Gaussian  pdf  approximation. 


2.5.1  A  Bayesian  Solution  for  Measurement  Origin  Uncertainty.  Although 
this  thesis  only  considers  tracking  a  single  target  in  clutter  using  the  target-oriented 
data  association  method,  a  Bayesian  solution  for  tracking  multiple  targets  in  the 
presence  of  measurement  origin  uncertainty  utilizing  the  measurement-oriented  data 
association  method  is  developed  in  this  subsection  based  on  [S,[27|[38].  The  reason 
multiple  targets  and  the  measurement-oriented  data  association  method  are  consid¬ 
ered  is  two-fold.  First,  the  multiple-target  solution  using  the  measurement-oriented 
data  association  method  is  more  general.  Second,  this  solution  readily  reduces  to  the 
single-target,  target-oriented  data  association  solution. 


A  new  set  of  notation  is  needed  for  the  multiple  target  tracking  in  the  presence 
of  measurement  origin  uncertainty  scenario.  When  considering  more  than  one  target, 
the  state  random  process  vector  must  include  the  state  random  process  vectors  of  all 
of  the  targets.  Thus,  a  joint  target  state  random  process  composite  vector  is  formed 
as 

Xi(k) 


X(k)  = 


(2.60) 


XNx(k)  (^) 
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where  iVr(/c)  is  the  number  of  targets  at  sample  k.  The  number  of  targets  is  con¬ 
strained  to  equal  the  sum  of  the  number  of  detected  targets,  Ndt,  and  the  number 
of  new  targets,  N^t,  at  scan  k.  Since  potentially  more  than  one  measurement  vec¬ 
tor  is  available  at  each  scan  due  to  measurement  origin  uncertainty,  the  composite 
measurement  random  process  vector  is  represented  by 


Z(k) 


zi  (k) 

zNm(k){k) 


(2.61) 


where  Nm(k )  is  the  random  number  of  measurements  at  sample  k.  Upon  observation 
of  the  measurements,  the  composite  measurement  random  process  vector  becomes  the 
realized  composite  measurement  vector  given  by 


Zfc,l 


Zk,Nm(k) 


(2.62) 


Using  the  new  definition  of  the  composite  measurement  vector,  the  measurement 
history  composite  vector  of  Subsection  12.2.11  becomes 


Z(  1) 


zk  = 


Z(k ) 


(2.63) 


If  one  only  considers  a  single-target  scenario  utilizing  the  target-oriented  data  asso¬ 
ciation  method,  then  Equation  (2.60)  is  simply  a  single  target  state  vector  and  the 
number  of  new  targets,  Nnt,  is  zero.  Equations  (2.61).  (2.62),  and  (2.63)  remain  un¬ 
changed.  Using  this  new  notation,  a  recursive  Bayesian  solution  may  be  formulated 
in  the  same  manner  as  in  Subsection  12.4.21 
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The  joint  target  state  random  process  composite  vector  conditioned  on  the  mea¬ 
surement  history  is 

/  (x(k),e(k)\zk)  =  /  (x(fc)|e(A0,  zk)  /  (e(*)|z‘) 

=  !(x(k)\e(k),zk)  Y  p(0W  =  eijz‘)i(e(A:)-ejj 

=  £  /(V(mei„Z‘)p(0IJZ‘).  (2.64) 

fe) 

The  second  line  of  this  equation  follows  from  the  generalized  definition  of  a  pdf  (see 
Subsection  12.4.11)  since  Q(k)  is  a  discrete  random  process  vector,  and  NH(k )  represents 
the  number  of  hypotheses  or  association  events  in  the  discrete  sample  space  of  the 
association  event  discrete  random  process  vector  0,;fc  (see  Figure  12.101  for  an  example 
sample  space).  A  deferred  decision  approach  is  desired  so  that  the  previous  asso¬ 
ciation  events  are  incorporated  into  the  decision  criterion  for  evaluating  association 
hypotheses.  Therefore,  the  Bayesian  solution  should  include  the  entire  association 
event  history ,  The  above  equation  is  then  written  in  terms  of  the  joint  pdf  of 

0(fc)  for  all  time  as 

f (x(k)Mk)\zl)  =  Y  ■■■  E  /(xwi{ei(};.z*)p({ei(};|zl).  (2.66) 

ik^NH{k)  iieiVtf(l) 

As  in  the  switching  model  case,  the  Bayesian  solution  results  in  a  Gaussian  mixture 
(in  this  case,  mixtures  of  mixtures).  It  is  computationally  intractable  unless  approxi¬ 
mated  using  one  of  the  methods  presented  in  the  section  introduction.  The  remainder 
of  this  subsection  is  dedicated  to  determining  p({Qit}\\Zk),  since  f(X(k)  |{©^}i,  Zk ) 
is  provided  by  the  Kalman  filter  solution  given  that  there  is  no  measurement  origin 
uncertainty. 
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The  joint  conditional  pmf,  p({Qie}k\Zk),  of  the  current  association  event  history, 
given  the  measurement  history,  Zk\  is  [5] 


p{{QH}kl\Zk)  = 


/(Q.».{eiJP1,-gW,zfc~1,ivroW) 

f(Z(k),Zk~\Nm(k)) 


(2.66) 


/  (Z(k)\Qik,  Zk~\  Nm{k ))  p  (QiJjQiJ?-1,  Zk~\  Nm{k ))  p  ({OJ^ Z^) 

f  (Z(k)\Zk~\Nm(k)) 

for  ik  =  1, . . . ,  Nn(k),  ig  —  1, . . . ,  Nh(£),  and  £  =  1, . . . ,  k  —  1.  Notice  the  condition¬ 
ing  on  the  number  of  measurements  at  sample  Ao,  Nm(k),  is  included  in  the  above 
pdfs/pmfs  since  this  quantity  determines  the  number  of  hypotheses  in  the  sample 
space  of  (see  the  example  in  Figure  [2.101  for  instance).  Furthermore,  condition¬ 
ing  on  the  current  measurement  history  Zk  implies  conditioning  on  the  number  of 
measurements,  Nm(k),  even  though  this  conditioning  is  not  explicitly  shown  in  the 
left-hand  side  of  Equation  (2.66).  Also,  the  conditioning  on  Nm(k)  was  dropped  in  the 
pmf  p({Qil),l~1\Zk-1)  because  the  number  of  current  measurements  has  no  bearing 
on  the  prior  association  event  history.  Equation  (2.66)  is  evaluated  in  the  next  three 
subsections. 


15This  number  represents  the  total  number  of  measurements  made  in  the  entire  surveillance  region 
so  that  a  new  track  initiation  capability  is  maintained  [27j. 
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2. 5. 1.1  The  Composite  Measurement  Likelihood  Function.  The  first 
term  in  the  numerator  of  the  second  line  of  Equation  (12.661).  the  composite  measure¬ 
ment  likelihood  function,  is  evaluated  in  this  subsection.  The  distribution  of  each 
measurement  random  process  vector  contained  in  the  composite  measurement  vector 
in  Equation  (12.611)  is  assumed  to  be  either  Gaussian,  for  measurements  hypothesized 
originating  from  existing  targets  in  a  previous  scan,  or  Uniform,  for  measurements 
hypothesized  originating  from  new  targets  or  false  source^16!.  For  each  realized  mea¬ 
surement  vector  at  sample  k,  Zkj,  hypothesized  under  association  event  Qik  (4  = 
1, . . . ,  Nn(k))  as  originating  from  a  corresponding  existing  target  j  (j  —  1, ,  N^T,ik ) 
that  was  hypothesized  in  a  previous  scan  and  detected  in  the  current  scan,  the  mea¬ 
surement  likelihood  function  for  the  jth  existing  target  is 


/  (Zj(*0|6ik,  {0JP1,  Z*-1,  Nm(k))  I  (t)  =  (2.67) 


exP  [~khSj  ‘WrU 

(27t)t  -^/det  S j[k) 


r k,j  =  zfc!i  -  Hjxj(k\k  -  1) 


S  s(k)  =  HjPj(k\k  —  l)Hj  +  Hj(k). 


The  measurement  likelihood  function  for  a  measurement  hypothesized  under  associa¬ 
tion  event  @ife  as  originating  from  a  potential  new  target  or  false  source  is  represented 
by  1/Vs,  where  V$  is  the  surveillance  volume  [51(27]. 


16In  previous  sections  of  this  chapter,  a  measurement  originating  from  a  target  is  assumed  to  be 
distributed  as  a  Gaussian  random  quantity  since  the  target  state  random  process  vector  is  assumed 
to  be  Gaussian.  In  this  section,  a  measurement  could  also  originate  from  a  new  target  or  a  false 
source.  Measurements  originating  from  new  targets  or  false  sources  are  assumed  to  appear  at  random 
locations  in  the  surveillance  volume  with  equal  probability.  Thus,  a  measurement  hypothesized  to 
originate  from  a  new  target  or  false  source  is  mathematically  modeled  as  a  uniformly-distributed 
random  variable. 
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Typically,  the  target  state  random  process  vectors,  which  form  the  joint  target 
state  random  process  composite  vector  in  Equation  (12.601).  are  assumed  to  be  mutually 
independent  [5].  Using  this  assumption  and  the  assumption  that  the  measurement 
noise  process  vector  in  Equation  (12.41)  is  independent  of  the  state  process  vector,  the 
measurement  random  process  vectors  forming  the  composite  measurement  vector  in 
Equation  (12.611)  are  also  mutually  independenlyj].  To  evaluate  the  composite  likeli¬ 
hood  function,  f(Z{k)\Qik,  Zk_1,  Nm(k)),  first  note  that  the  total  number 

of  measurements  at  scan  k,  Nm(k),  is  equal  to  the  sum  of  the  number  of  measure¬ 
ments  hypothesized  under  a  given  association  event,  @jfe,  as  originating  from  targets 
that  were  hypothesized  in  a  previous  scan  and  detected  in  the  current  scan  (NDTjik), 
potential  new  targets  (NNTtik),  and  false  sources  (NFTjik).  Invoking  the  mutual  inde¬ 
pendence  of  the  measurement  random  process  vectors  and  the  modeling  assumptions 
in  the  previous  paragraph,  the  composite  measurement  likelihood  function  may  be 
written  as 


/(z(t)|ei„{ei(}{-1,z‘-1,ivm(i)) 


Z(k)= Zk 


—  ) 

Vs 


NfT,  ifc 


■  L  I  {zfcj} 1 


Ndt, 


UwT°T\{s.jM}fOT'“ 


-  (±U'T'“+A,"T',‘  'Tp  exp  [-KjS-'Wrt,,] 

VKs/  j:  I  (27r)f  ^det  S j(k) 

for  ik  —  1, ... ,  NH{k).  The  terms  rkj  and  S j  are  defined  in  Equation  (12.671).  If  mea¬ 
surement  gating  is  used,  then  measurements  falling  outside  of  the  gate  of  an  existing 
target  cannot  be  hypothesized  to  have  originated  from  that  target.  In  addition,  if  an 
association  event  consists  of  hypotheses  which  do  not  associate  any  measurement  to 


1(One  possible  intuitive  justification  for  this  assumption  lies  in  the  previously-made  condition 
that  a  measurement  cannot  originate  from  multiple  sources.  Since  each  measurement  is  assumed  to 
originate  from  a  distinct  source,  it  seems  intuitive  to  believe  that  the  measurements  are  mutually 
independent. 
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existing  targets  hypothesized  in  the  previous  scan  and  detected  in  the  current  scan, 
then  NoT,ik  is  zero  for  that  association  event,  and  the  product  term,  rTU-),  in 
Equation  (12.68ft  is  not  evaluated. 

Equation  (12.681)  may  be  applied  to  the  example  given  in  the  section  introduction 
and  shown  in  Figure (2301  Consider  association  event  07fc  in  which  both  measurements 
are  hypothesized  to  have  originated  from  false  sources.  For  this  association  event,  the 
composite  measurement  likelihood  becomes  (1/Vs)2  since  NDTjk  and  NNT,7k  equal 
zero,  and  Nppjk  is  two  (recall  that  hypothesis  Hpp  supposes  that  a  measurement  is 
due  to  a  false  source).  As  another  example  of  using  Equation  (12.68).  consider  associ¬ 
ation  event  ©2fe  (in  the  same  figure)  which  hypothesizes  that  the  first  measurement  is 
due  to  the  existing  target  hypothesized  in  a  previous  scan  (as  indicated  by  hypothesis 
Hdt ),  and  that  the  other  measurement  is  due  to  a  new  target  (hypothesized  under 
H nt) •  Its  composite  measurement  likelihood  function  is  (l/Vs)L(zky,  rfc,i,  Si(/c)). 

2.5. 1.2  The  Conditional  Current  Association  Event  pmf.  Consider  the 
conditional  current  association  event  pmf,  conditioned  on  the  current  number  of  mea¬ 
surements  and  the  prior  association  event  and  measurement  histories, 
p(0jfc|{0i£}i_1,  Zk~\Nm(k)).  To  evaluate  this  pmf,  one  must  either  wisely  choose  a 
discrete  probability  distribution  model  for  0ife  (e.g.,  discrete  Uniform,  Poisson,  Multi¬ 
nomial,  etc.)  or  use  some  other  assumption  about  the  random  quantity  Qik.  Any 
choice  of  probability  distribution  model  that  does  not  incorporate  prior  knowledge 
about  the  detection  capabilities  of  the  tracking  system  or  other  engineering  insights, 
such  as  the  number  of  expected  false-origin  measurements  or  potential  new  targets  in 
a  given  surveillance  volume,  would  be  unwise.  Such  prior  knowledge  could  be  incor¬ 
porated  into  the  current  association  event  pmf  through  Np>T,ik,  NNT,ik,  and  Npp,ik 
by  modeling  these  integer-valued  variables  as  appropriately-chosen  random  variables. 
This  approach  is  taken  in  [5], [27],  and  it  will  be  reproduced  in  this  subsection. 

First,  model  the  number  of  measurements  hypothesized  according  to  the  i/f  as¬ 
sociation  event  as  originating  from  existing  targets  hypothesized  in  a  previous  scan 
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and  detected  in  the  current  scan  (NoT,ik),  potential  new  targets  (iVjvT,ifc),  and  false 
sources  ( NFF,ik )  as  integer-valued  random  variables.  Next,  form  the  joint  conditional 
pmf  of  the  random  quantities  Qik,  NDTtik,  NNT>ik,  and  NFTjik.  Applying  the  condi¬ 
tional  probability  rule  for  pmfs  to  this  joint  conditional  pmf  yields  the  desired  result 

p{®ik,  NDT,ik,NNTtik,  NFTiik K©^}?-1,  Zk~\ Nm(k )) 

=  P  (< ®ik\NDT,ik ,  NNTjik,  NFTtik,  {©ij*-1,  Zk~\ Nm(k )) 

•  p  (. NDTiik ,  NNTjik,NFT,ik \{<dk}kr\  Zk~\  Nm{k ))  .  (2.69) 


The  conditional  pmf  of  the  current  association  event  is  now  conditioned  on  the  in¬ 
formation  one  may  possess  about  the  detection  capabilities,  the  expected  number  of 
false-origin  measurements,  and  the  expected  number  of  potential  new  targets.  This 
conditioning  achieves  the  desired  goal  of  the  previous  paragraph.  The  second  con¬ 
ditional  pmf  in  Equation  (2.69)  depends  on  the  probability  distribution  assigned  to 
each  of  the  integer- valued  random  variables  NDTjik,  NNTjik,  and  NFT^k.  Both  pmfs 
on  the  right-hand  side  of  Equation  (2.69)  are  evaluated  next. 


If  all  association  events  containing  the  same  number  of  detected  targets,  the 
same  number  of  new  target  measurements,  and  the  same  number  of  false-origin  mea¬ 


surements  are  considered  equally  likely  [38] ,  then  the  new  conditional  current  associa¬ 
tion  event  pmf  p{®ik\NDT,ik,  NNTjik,  NFTjik,  Zk~l ,  Nm(k ))  may  be  evaluated 

by  counting  methods,  like  those  found  in  [20].  Assuming  that  measurement  gating  is 
not  used  (to  simplify  the  mattei[L8|),  the  total  number  of  association  events  is  found 
by  calculating  the  product  of  the  number  of  ways  to  partition  Nm(k )  measurements 
into  Nr>T,ik,  A rNT,ik,  and  NFF,ik  mutually  exclusive  and  exhaustive  partitions,  and  the 
number  of  ways  to  assign  NFT,ik  measurements  to  the  existing  targets  hypothesized 


18Measurement  gating  imposes  a  restriction  on  associations  between  observed  measurements  and 
existing  targets  hypothesized  in  a  previous  scan  since  observations  outside  the  measurement  gate  of 
an  existing  target  may  not  be  assigned  to  that  target.  This  restriction  makes  developing  a  general 
equation  for  the  total  number  of  association  events  very  difficult  due  to  the  random  nature  of  the 
measurements  (i.e.,  one  does  not  know  a  priori  where  measurements  may  fall  within  a  surveillance 
volume). 
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under  the  association  event  history  through  sample  k  —  1,  {0^}i  The  first  quantity 
in  the  product  is  found  by  using  the  multinomial  coefficient  [201127], 


Nm(k)\ 

NDT,ik\ N NT,ik-N FT,ik-' 


(2.70) 


since  this  coefficient  is  the  number  of  ways  to  partition  the  total  number  of  measure¬ 
ments  at  scan  k  into  three  mutually  exclusive  and  exhaustive  groups.  The  number 
of  ways  to  assign  the  hypothesized  number  of  measurements  associated  with  detected 
targets  in  the  current  scan  k,  NoT,ik,  to  the  total  number  of  existing  targets  hypoth¬ 
esized  under  the  association  event  history  through  sample  k  —  1,  denoted  as  NTGT, 
is  pZ 

(2.71) 


TGT- 


( Ntgt  ~  Nr,T,ik)\ 


(Notice  that  Equation  (2.711  is  the  “sampling  without  replacement  and  with  order¬ 
ing”  equation  found  in  [20].)  Combining  Equations  (2.70)  and  (2.71)  yields  the  new 
conditional  current  association  event  pmf  (the  first  pmf  on  the  right-hand  side  of 
Equation  (2.69))  [27]: 


P  {®ik\NDT,ik,  NNTiik,  NFT!ik,  {0jJi 


k- 1 


rk—1 


Nm(k )) 


_ Nm(k)\ _  _ Ntgt\ _ 

_Nr)T,ik-N NT,ik\NpT,ik-  ( Ntgt  —  AArrq*.)! 

Technically,  Equation  (2.72)  is  not  a  true  pmf  since  it  is  not  normalized.  A  normaliza¬ 
tion  factor  could  be  included  in  this  expression,  however,  since  the  joint  conditional 
pmf  of  the  current  association  event  history,  Equation  (2.66),  is  normalized,  normal¬ 
ization  of  Equation  (2.72)  is  omitted. 

The  joint  conditional  pmf  of  Nf)T,ik,  N]yT,ik,  and  NFT,ik,  which  appears  in  the 
last  line  of  Equation  (2.69),  may  be  found  by  first  assuming  that  the  random  variables 


(2.72) 
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N DT,iki  N nt,  ik  i  and  NFF,ik  are  mutually  independent 
these  random  variables  becomes 


Then,  the  joint  pmf  of 


=  p  (Njyr,ik\{Qit}ki-\  Zk~\  Nm{kj)  p  (NNTtik\{Git}’l-1,  Zk~\  Nm{k )) 
■p(iVFT,iJ{0iJt-1,Zi:-1,iVm( A:))  .  (2.73) 

The  random  variable  is  modeled  as  Binomial  with  parameters  Pjj,  the  proba¬ 

bility  of  detection,  and  Ntgt,  the  total  number  of  existing  targets  hypothesized  under 
the  association  event  history  through  sample  k—  1.  The  pmf  of  this  Binomial  random 
variable  is  [20.271 


p(NDT,,tm,}\-\Zk-\Nm(k)) 


N- 


TGT- 


N DT,ik-{NrGT  —  Njyr,ik)\ 


pNOT:ik 

rD 


(1  -Pi 


D 


,NTGT—^DT,ik 


(2.74) 


Poisson  random  variables  are  used  to  model  NNT,ik  and  NFTik ,  and  the  pmf  for  each 
random  variable  is  [201127] 


p(^NT,it\{ei,}kG,  zt-\Nm(k)) 


(\NTVs)NNT’lk 

NntpJ 


g— A ntVs 


p{NFTAk\{QH}kl-\Zk-\Nm{k)) 


( XFTVs)NFT,Zk 


N, 


FT,ik- 


g- XftVs 


(2.75) 


Since  the  parameter  of  a  Poisson  pmf  has  the  units  of  “average  number  of  events,”  the 
constants  A  ntVs  and  A  FFVs  may  be  interpreted  as  the  expected  number  of  new  targets 
and  false-origin  measurements  in  a  given  surveillance  volume,  Vs-  The  terms  XFF  and 


19 Williams  [38]  pointed  out  that  the  conditioning  on  Nm(k)  in  this  pmf  makes  this  independence 
assumption  questionable  since  the  random  variables  NFF,  ik ,  Nnt,  ik ,  and  NFF,  ik  are  mathematically 
related  by  the  equation  Nm{k)  =  NFT,ik  +  N NT,ik  +  NFT,ik-  However,  it  was  shown  in  [38]  that 
the  mutual  independence  assumption  may  be  made  without  conditioning  on  the  number  of  current 
measurements  by  applying  Bayes’  rule  to  remove  this  conditioning. 
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Xnt  are  the  false-origin  measurement  clutter  density  and  new  target  measurement 
clutter  density,  respectively. 


Finally,  combining  Equations  (12.721).  (12.741),  and  (12.751),  and  canceling  the  terms 
NTgt !,  NDT,iJ,  {Ntgt  -  NDT}ik)\,  NFT)ik\,  and  NNTtik\,  the  conditional  current  asso¬ 
ciation  event  pmf  under  the  measurement-oriented  data  association  assumption  and 


in  the  absence  of  measurement  gating  is  [27 


Nm(k)\  D  { 


p\NTGT—NDT^k 


(2.76) 


■  (A FTVs)NFT'ik  e~XFTVs  •  (A ntVs)Nnt^  e~XNTVs 

for  ik  =  1, . . . ,  Nn(k).  Notice  that  this  pmf  includes  information  a  designer  might 
possess  about  the  detection  capabilities  of  a  tracking  system  and  the  expected  number 
of  false-origin  and  new  target  measurements  generated  by  the  tracker  in  a  certain 
surveillance  volume.  Also,  this  pmf  may  be  modified  in  the  case  of  a  target-oriented 
data  association  approach  by  setting  NNF,ik  to  zero  with  probability  one. 

2.5. 1.3  The  Normalization  Factor.  The  final  piece  of  Equation  (12.66), 
the  normalization  factor  in  the  denominator,  is  evaluated  in  this  subsection.  Since 
the  joint  pdf  of  the  entire  measurement  history,  the  current  number  of  measurements, 
and  the  entire  association  event  history  is  assumed  to  exist  (as  shown  in  the  first 
line  of  Equation  (12.66)).  the  normalization  factor  may  be  expanded  using  marginal 
probabilities  as 

f(z(k)\zt-\Nm(k))=  Y.  E  f(z(k),el„,...,ell\zt-\Nm(k)). 

ik&NH(k)  uGJVh(I) 
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Two  applications  of  the  law  of  conditional  probability  for  pdfs/pmfs  yields  the  desired 
result 

f(Z(k)\Zt-\Nm(k))=  Y.  ■■■  E  f(Z(km*,{e>.,}krl,Zt-\Nm(k)) 

ik&NH(k)  heNH(l) 


■  V  (©(y{©«}t-',  Zk-\ Nm(k))  p  ,  (2,77) 


Finally,  recall  Equation  (12.661),  the  joint  conditional  pmf  of  the  current  associ¬ 
ation  event  history: 


\zk) 


f  {QikA®it}ki-\z(k),zk-\Nm(k)) 
f(Z(k),Zk-\Nm(k)) 


(ESS) 


/  (z(fc)  \eih,  {e^f1,  zk~\  Nm{k ))  P  (e^Ke^}*-1,  zk~\  Nm{k ))  P  ({e^iz*-1) 

f  {Z(k)\Zk~\Nm(k)) 

Substituting  Equations  (12.681),  (12.761).  and  (12.771)  into  this  expression  and  canceling 
the  term  (1  /Vs)NFT’ik+NNT'ik  with  the  term  y^FT'tk+NNT’Zk  jn  both  the  numerator  and 
denominator,  and  canceling  the  terms  e~>'FTVs  and  e~XNTVs  yields  the  intermediate 
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result 


p{{®i<}i\zk) 


NUM 

DEN 


(2.78) 


NUM 


Not,  ik 

n 


j=i 


exp  [~  pNDT,ik 

(2n)^  ^/det  S j(k)  D 


(1  -  Pd) 


NTGT—NoT,ik 


•  {XftVs)Nft^  (A ntV3)Nnt^  P 


DEN 


NDT,ik 

e  e  n 


ik£NH(k)  heNH(l)  j= 1 


exp  [-K,jSji(fc)r^-]„^rQ  ,fc_i 
(27r)f  ^/det  Sj(fc)  £  1 


■  PpDT’ik  (1  -  pD)NTGT-NDT,ik  (. \FTysfFT,ik  (\NTVsfNT’ik  . 

The  final  result  is  obtained  by  substituting  Equation  (12.661)  into  Equation  (12.65). 

f(x(k),s(k) \zk)=  y.  ■■■  E  p({e«}?izl)/(^wi{0i,};,z‘)  (ess) 

ik&NH(k)  uGJVh(I) 

which  is  a  multivariate  Gaussian  mixture  with  Nn{k)  ■  Nn(k  —  1)  •  •  ■  Nh(2)  ■  Nh{  1) 
components.  At  each  new  scan,  another  summation  is  added  to  this  equation,  and 
the  Bayesian  solution  cannot  be  implemented  without  approximation. 

Approximating  Equation  (12.651)  to  trim  computations  while  maintaining  good 
performance  is  the  focus  of  this  thesis.  Effectively,  the  approximation  will  reduce  the 
original  N uik)  ■  N n{k  —  1)  ■  ■  ■  Nh(2)-Nh(1)  Gaussian  mixture  components  in  Equation 
(12.651)  to  some  manageable  level.  In  practice,  this  reduction  will  be  accomplished 
by  a  mixture  reduction  algorithm  (MRA)  which  will  reduce  the  Nn(k)  number  of 
association  events  (or,  equivalently,  mixture  components)  to  N^(k)  association  events 
at  the  end  of  each  scan  k. 
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2.5.2  Tracking  with  Measurement  Origin  Uncertainty  Summary.  Measure¬ 
ment  origin  uncertainty  leads  to  the  data  association  problem  in  which  the  source 
of  a  measurement  is  ambiguous.  One  poses  hypotheses  about  the  possible  source  of 
each  observed  measurement  as  a  first-step  towards  solving  this  problem.  Hypothe¬ 
ses  are  formed  according  to  one  of  two  assumptions  about  the  potential  origin  of  a 
measurement  used  in  practice.  The  target-oriented  data  association  approach  assumes 
that  the  potential  sources  of  measurements  are  existing  tracks  that  were  hypothesized 
in  previous  scans  or  false  sources,  while  the  measurement-oriented  data  association 
method  supposes  that  measurements  may  also  arise  from  potential  new  tracks  in  ad¬ 
dition  to  existing  tracks  and  false  sources.  Both  data  association  approaches  restrict 
the  number  of  measurements  generated  by  any  single  source  to  one. 

A  rigorous  Bayesian  solution  for  tracking  multiple  targets  in  the  presence  of 
measurement  origin  uncertainty  using  the  measurement-oriented  data  association  ap¬ 
proach  was  presented  in  this  section  as  a  second  step  towards  solving  the  data  associ¬ 
ation  problem.  Association  events  were  modeled  as  a  discrete  random  process  vector, 
Q(k)  =  Qik,  and  were  included  with  the  joint  target  state  random  process  composite 
vector  X(k)  as  another  random  quantity  to  be  estimated.  The  resulting  Bayesian  so¬ 
lution  was  mathematically  similar  to  that  for  the  switching  model  case  of  Subsection 
12.4.21  since  the  measurement  origin  uncertainty  was  modeled  in  the  same  manner  as 
the  kinematics  model  parameter  uncertainty  for  switching  models.  Information  about 
the  detection  capabilities  of  a  tracking  system  and  the  expected  number  of  false-origin 
and  new  target  measurements  generated  by  the  tracker  in  a  certain  surveillance  vol¬ 
ume  were  embedded  into  the  pmf  of  the  association  event  history.  Where  appropriate, 
the  steps  necessary  to  convert  the  multiple-target,  measurement-oriented  data  associ¬ 
ation  method  Bayesian  solution  into  the  single-target,  target-oriented  data  association 
approach  solution  were  noted. 

The  final  form  of  the  solution  is  a  Gaussian  mixture  with  Nfj(k)  ■  ■  ■  Nh(1)  com¬ 
ponents  (as  shown  in  Equation  (2 .651)).  This  solution  is  intractable,  and  some  type  of 
approximation  is  necessary  to  make  the  solution  a  viable  candidate  for  practical  im- 
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plementation.  Two  fundamental  approximations  are  to  reduce  the  Gaussian  mixture 
to  a  single  component,  which  is  used  by  PDAF  and  JPDAF,  or  to  a  lower  number 
of  components,  which  is  the  approximation  used  by  the  Joining  and  Clustering  and 
the  Integral  Square  Error  cost-function-based  MRAs,  at  the  end  of  each  scan  k.  The 
focus  of  this  thesis  is  on  the  latter  kind  of  approximation,  specifically  the  type  of 
MRA  introduced  by  Williams  in  [38lRt0ll4I]. 

2. 6  Summary 

This  chapter  introduced  target  tracking  as  a  means  of  determining  the  state 
of  targets  over  some  time  interval  of  interest  from  observations  of  the  targets  in  the 
presence  of  uncertainty.  Two  basic  sources  of  uncertainty  are  due  to  mathematical 
models  of  the  targets’  dynamics  which  may,  at  best,  only  approximate  the  true  motion 
of  the  targets  and  may  change  substantially  from  one  time  instant  to  another,  and 
from  sensor  noise  which  corrupts  measurements.  If  the  target  kinematics  model  and 
measurement  model  are  linear,  and  all  random  quantities  are  modeled  as  Gaussian, 
then  the  Kalman  filter,  which  is  a  linear  recursive  Bayesian  filter,  provides  the  optimal 
mean  and  covariance  estimates  of  the  target  state  random  process  vector  under  almost 
all  practical  criteria,  conditioned  on  an  assumed  measurement  association  history  and 
an  assumed  target  dynamics  model.  If  either  model  is  nonlinear,  then  a  nonlinear 
recursive  Bayesian  filter  is  used  instead.  However,  the  nonlinear  filter  will,  in  general, 
not  produce  an  optimal  estimate  of  the  target  state  process  vector. 

Gaussian  mixtures  result  from  using  Bayes  estimation  to  solve  target  tracking 
problems  in  which  kinematics  model  parameter  and  measurement  origin  uncertainty 
exists.  The  general  form  of  a  multivariate  Gaussian  mixture  pdf  was  presented,  and 
equations  for  calculating  the  overall  mean  and  covariance  of  a  target  state  random 
process  vector  described  by  a  multivariate  Gaussian  mixture  pdf  were  provided.  Also, 
the  effects  of  merging  or  deleting  mixture  components  on  the  resulting  Gaussian 
mixture  during  a  mixture  reduction  process  were  described. 
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Bayesian  solutions  for  the  target  state  pdf  in  the  presence  of  kinematics  model 


parameter!20!  and  measurement  origin  uncertainty  were  also  derived  using  Bayes  esti¬ 
mation.  Solutions  which  modeled  the  uncertainty  as  a  random  vector  were  tractable, 
while  Bayesian  solutions  which  represented  the  uncertainty  as  a  random  process  vec¬ 
tor  were  intractable,  and  approximation  was  necessary  to  implement  the  solutions. 
In  fact,  new  methods  for  approximating  the  rigorous  Bayesian  solution  for  the  target 
state  Gaussian  mixture  pdf  in  the  presence  of  measurement  origin  uncertainty  is  the 
focus  of  this  thesis,  and  such  methods  will  be  presented  in  subsequent  chapters. 


20Kinematics  model  parameter  uncertainty  should  not  be  confused  with  uncertainty  in  a  kinematics 
model  due  to  mathematically  modeling  a  target’s  dynamics.  The  former  source  of  uncertainty  arises 
from  a  designer  not  knowing  which  model  to  use,  while  the  latter  source  of  uncertainty  is  a  simple 
admission  that  mathematical  equations  cannot  exactly  describe  the  realistic  motion  of  an  object. 
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III.  Estimating  Probability  Density  Functions 


I 


n 


38],  Williams  derived  the  Maximum  Likelihood  measure  which  he  believes  . .  is 


probably  the  most  physically  meaningful  cost  function  for  this  application  [Gaus¬ 
sian  mixture  reduction].”  Because  one  aspect  of  the  goal  for  this  thesis  is  to  develop 
a  new  mixture  reduction  algorithm  which  outperforms  any  previously  published  algo¬ 
rithm,  Williams’  endorsement  of  the  Maximum  Likelihood  measure  is  the  motivation 
behind  this  chapter.  As  such,  Chapter  HI  explores  the  techniques  of  pdf  estimation 
drawn  from  statistical  inference  which  is  based  on  the  well-developed  field  of  the 
mathematical  theory  of  probability  and  mathematical  statistics  (to  include  maximum 
likelihood  estimation).  Given  a  set  of  random  observations,  one  tries  to  “infer”  the 
underlying  distribution  that  spawned  these  samples.  In  some  cases  this  distribution 
is  known,  or  at  least  assumed  known,  to  be  of  a  certain  type  (Gaussian,  Poisson,  etc.) 
and  the  task  is  to  estimate  the  parameter  or  parameters  of  the  distribution  (e.g.,  the 
mean  and  variance  for  a  Gaussian  density,  the  rate  for  a  Poisson  distribution,  etc.). 
In  other  cases  the  distribution  may  be  known  (or  assumed  known)  to  be  limited  to 
some  set  of  possible  distributions,  and  the  task  now  is  to  identify  the  correct  one  from 
the  set  and  estimate  the  parameter  or  parameters  of  this  distribution. 

Although  pdf  estimation  may  not  seem  directly  applicable  to  the  purpose  of 
approximating  a  Gaussian  mixture  with  one  containing  a  lower  number  of  compo¬ 
nents,  it  is  useful  to  explore  the  concepts  and  techniques  of  this  held  in  the  hope  of 
gaining  insights  into  an  appropriate  method  for  mixture  approximation  (such  as  the 
Maximum  Likelihood  measure  which  Williams  developed  in  [38]).  Generally,  there 
are  at  least  two  methods  of  estimation  that  can  be  applied  to  this  problem:  maxi¬ 
mum  likelihood  estimation  (MLE)  and  Bayesian  estimation.  Both  methods  attempt 
to  estimate  the  one  or  many  parameters  of  a  presumed  pdf,  but  differ  in  application. 
MLE  is  used  when  the  pdf  parameters  are  deterministic  (fixed  but  unknown)  while 
Bayesian  estimation  can  handle  the  case  in  which  the  parameters  are  random.  In  this 
thesis,  the  parameters  of  a  Gaussian  mixture  will  be  modeled  as  deterministic  quan- 
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tities,  and  this  chapter  uses  MLE  to  solve  for  these  parameter^.  This  chapter  also 
introduces  an  iterative  implementation  of  MLE  called  the  Expectation  Maximization 


(EM)  algorithm. 

The  following  sections  are  not  intended  to  be  a  comprehensive  narrative  of  MLE 
methods,  but  rather  an  introduction  highlighting  certain  aspects  of  this  approach  that 
may  shed  light  on  approximating  a  full-component  Gaussian  mixture  with  one  having 
a  lower  number  of  mixture  components.  Complete  treatments  of  MLE  are  presented 
in  pU|[IT([251S2],  from  a  mathematical  statistics  perspective,  and  [2T1[22J[35|[36],  from 
an  engineering  viewpoint. 


3.1  Maximum  Likelihood  Estimation 


The  method  of  maximum  likelihood  can  be  traced  back  to  Gauss  but  was  not 
applied  to  general  estimation  problems  until  R.  A.  Fisher  published  a  short  paper  on 
the  topic  in  1912  [TO]].  Over  the  next  thirty  years  Cramer  [TO] ,  Rao  [25],  and  others 
developed  this  method  in  a  more  formal  mathematical  manner.  As  their  work  popu¬ 
larized  the  method  of  maximum  likelihood,  it  became  commonly  known  as  maximum 
likelihood  estimation  (MLE)  and  as  the  preferred  way  to  estimate  the  deterministic 
parameters  of  a  pdf. 

MLE  is  ideally  suited  to  estimating  one  or  more  deterministic  parameters  of 
a  pdf  when  independent  samples  are  drawn  from  a  known  distribution.  That  is, 
given  a  set  of  independent  identically  distributed  (i.i.d.)  observations  from  a  random 
quantity  for  which  the  mathematical  form  of  the  pdf  is  known  (except  for  a  certain 
number  of  parameters),  the  MLE  approach  may  be  used  to  find  an  estimate  of  the 
pdf  parameters.  This  estimate  is  a  random  quantity  since  it  is  a  function  of  the 
random  observations,  and  under  the  Cramer- Rao  (C.R.)  regularity  conditions  (see  [42 


1In  Chapter  II.  Bayesian  estimation  was  used  to  estimate  the  target  state  since  it  was  modeled 
as  a  random  process  vector.  However,  this  chapter  does  not  consider  Bayesian  estimation  for  the 
Gaussian  mixture  parameters  since  they  are  modeled  as  unknown  deterministic  quantities. 
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pp.  182-183),  the  estimate  has  the  asymptotic  (the  number  of  observations  is  large) 
qualities  of  piOl  1251 142] : 

•  converging  to  the  true  value  of  the  parameters, 

•  converging  to  a  Gaussian  distribution,  and 

•  efficiency  (the  distribution  of  the  estimate  is  minimum  variance). 

A  detailed  derivation  of  these  properties  is  contained  in  Subsection  137011  to  help  the 
reader  better  understand  the  asymptotic  qualities  of  MLE.  For  instance,  the  derivation 
will  show  that  the  first  property  listed  above  is  not  always  true  since  the  maximization 
may  only  converge  to  one  of  a  number  of  local  maxima. 

As  an  example,  consider  the  problem  of  estimating  a  single  parameter  of  a  pdf 
f(z\a)  when  a  set  of  n  i.i.d.  observations  {zi} f  are  made.  The  likelihood  function  is 
defined  as 

n 

L({zi}f]a)  =  Y[f(zi\a)  (3.1) 

i— 1 

since  the  observations  are  i.i.d  (i.e.,  they  are  independent  so  the  joint  density  of  the  n 
observations  is  just  the  product  of  the  separate  marginal  densities  and  the  marginal 
density  has  the  same  form).  The  method  of  maximum  likelihood  is  simply  maximizing 
this  expression  for  some  a  in  the  open  interval  A  (a  is  not  an  endpoint).  In  theory  the 
estimate  from  this  maximization  will  converge  to  the  true  value  of  the  parameter  as  the 
number  of  observations  grows;  however,  there  is  no  guarantee  that  the  maximization 
is  global,  and  the  true  value  of  the  parameter  may  not  be  so  easily  found.  This  fact 
will  become  evident  in  the  proof  that  follows. 

3.1.1  Asymptotic  Properties  of  MLE.  The  three  asymptotic  qualities  of 
MLE  will  be  proven  for  a  single  parameter  in  a  derivation  according  to  p70ll25ll42] .  But 
before  continuing,  it  is  necessary  to  introduce  some  preliminary  information.  First, 
the  product  form  of  L(-)  is  converted  into  a  summation  so  that  useful  convergence 
theorems  may  be  invoked.  Since  In  YYi=i  Vi  =  ^1=1  ^n(2/*)  and  hi(-)  is  a  monotonically 
increasing  function,  the  value  of  a  that  maximizes  Equation  (3.1)  also  maximizes  the 
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log-likelihood,  function 

n 

In  L({zi}f]a)  =  y^\nf(zi\a).  (3.2) 

i= 1 

By  solving  the  likelihood  equation 


^ln  a)  _  (  , 

da  -  ('  '  ) 

for  ami  one  obtains  the  maximum  likelihood  estimate  of  the  parameter  a.  Second,  the 
following  conditions  are  imposed  pTOl [251 142] : 


(i)  a  e  A  G  M,  a  not  an  endpoint  of  A. 

(ii)  For  almost  all  z,  In  f(z\a)  is  analytic!2 

(iii)  The  observations,  Zi,  are  i.i.d.. 

(iv)  For  every  a  G  A, 

roo  /  Q\nf(z\a) 


m  a. 


'-no  V  da 


f(z\a0)dz  =  k2  <  oo 


where  aQ  is  the  true  value  of  the  parameter  and  k  >  0. 
(v)  All  moments  of  dhlg^a'>  are  finite;  i.e. 


Eid‘lnf(z\a)><oo 
oal 


(3.4) 


(3.5) 


Third,  the  following  identities  will  be  used. 


(a)  By  the  chain  rule  of  calculus, 


d  In  f(z\a) 

1  df(z\a) 

da 

f(z\a)  da 

2  An  analytic  function  is  guaranteed  an  infinite  number  of  finite- valued  derivatives  [3]. 
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(b)  The  first  and  second  partial  derivatives  of  J f(z\a0)dz  are  zero  since 

/OO 

(f(z\a))a=a0  dz  ~  1 
-oo 

and  differentiating  this  expression  with  respect  to  a  yields: 


Thus: 


df(z\a) 

da 

32/(z  |q) 

da 2 


dz  =  0 


dz  =  0. 


(c)  E{[(d2/da2)\nf(z\a)\a=ao}  =  -E{[(d/da)  In  f  (z\a)]2a=aJ  si 
'  d2  In  f(z\a) 


since 


da2 


a=aQ 


=  E 


=  -E 


=  -E 


(3.6) 


(3.7) 


' d2f(z\a ) 
da2 

f(z\a ) 


df(z\a) 

da 

f(z\a) 


-  E 


df{z\a) 

da 

f(z\a ) 


a=aQ 


<9  In  f(z\a) 
da 


(3.8) 
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Note  that  the  derivative  can  be  taken  inside  the  integral  since  it  is  with  respect  to  the 


parameter  a  and  not  z  (this  can  be  shown  by  a  limit  argument;  see  pp.  66-68  of  [10]). 
With  this  information  in  mind,  the  asymptotic  qualities  of  MLE  will  be  derived. 

To  show  that  MLE  converges  in  probability  to  the  true  parameter,  aQ,  consider 
the  parameter  values  a  =  aQ  ±  6  where  8  is  some  arbitrarily  small  positive  number. 


By  Jensen’s  inequality  [25 


In  I  ,/ )  f(z\a0)dz  >  0 
f(z\a0±d) 


In  f(z\a0)f(z\a0)dz  —  /  In  f(z\a0  ±  5)f(z\a0)dz  >  0 


In  f(z\a0)f(z\a0)dz  >  /  In  f(z\a0  ±  8)f(z\a0)dz 


E  {In  f(z\a0)}  >  E  {In  f(z\aQ  ±  <5)}  . 


(3.9) 


Invoking  the  strong  law  of  large  numbers3]  the  expectations  may  be  written  as 


^  (t  ^  n 

~y^ hi  f(zi\a0)  >  -  hi  f(zi\a0  ±  8) 

n  z — '  n  z — ' 

i=l  i= 1 


hi L({zi}±]  aQ)  >  In L({zi}i-,a0±5). 


(3.10) 


This  equation  shows  that,  for  almost  all  sample  sequences  {zi} ",  lnL({zj}™;  aQ)  will 
be  greater  than  In L({^j}";  aQ  ±  5).  Since  In  f(z\a)  is  analytic  by  condition  (ii),  it  is 
differentiable  and  continuous  for  all  a  so  that  there  is  a  stationary  point  (the  derivative 


3 If  y  is  a  random  variable  with  finite  variance,  then 


2—1 


where  a.s.  means  “almost  surely,”  or  “with  probability  1. 


of  the  log-likelihood  function  in  Equation  (|3.3f)  is  zero  at  this  point)  within  the  region 
a  =  a0  ±  5.  If  the  stationary  point  is  a*({zi }")  e  (a0  ±  S )  then  by  letting  <5  — >  0, 
a*({zi} i)  ao ■  So  MLE  converges  with  probability  1  to  a  maximum  of  the  likelihood 
function  (13.1)  which  may  or  may  not  be  a  global  maximum  value. 


Given  the  existence  of  a  solution,  one  can  show  that  this  solution  is  asymp¬ 
totically  normally  distributed.  Using  a  Taylor  series  expansion)1]  about  aQ,  the  true 
value  of  the  parameter,  Equation  (|3.3)  may  be  written  as  (with  a*({^}")  simply 
represented  by  a*) 


dlnf(zj\a ) 
da 


a=a* 


d]nf(zj\at) 

da 


a=aQ 


+  (a* 


d2  Inf  fa;  a) 
da2 


+ 


a=aQ 


=  o. 


H.O.T. 


The  term  H.O.T.  represents  higher  order  terms  in  (a*  —  a0).  Next  multiply  both  sides 
of  this  equation  by  1/n 


1  n 

iy\ 

n  ^  \ 

i=  1  ' 


(  d\i\f(zi\a) 
da 


+  (a* 


1  A 


OL0) 

n 


i=  1 


d2  \nf(zi\a) 
da2 


+  -H.O.T.  =  0. 

n 


iThe  Taylor  series  expansion  of  f(x )  about  the  point  x0  is 


m = £ 


2=0 
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By  setting  T\  equal  to  the  the  second  summation  term  scaled  by  1/nu  and  combining 


the  1/n  scalar  with  the  H.O.T.  term  to  form  H.O.T.,  this  equation  may  be  written  as 


■1  ( d In  / (zj\a] 

n  ^  \  da 

i= 1  v 


+  ( a *  —  a0)Ti  +  H.O.T.  =  0. 


After  a  few  lines  of  algebra  this  equation  becomes 


k^n( 


a  —  an  — 


1  y-  / d\nf(zj\a) 
kJn,  ^  I  da 

i=  1  x 


— Tx  +  H.O.T. 

kz 


(3.11) 


(recall  that  k  is  the  standard  deviation  of  the  partial  derivative  with  respect  to  a  of 
the  true  log-likelihood  lnf(z\a)  as  given  in  Equation  (33)).  Finally,  take  the  limit 

5Ti  (“Ti”  is  used  to  represent  the  first  order  term  of  the  expansion)  equals  Equation  (13.81) . 

E{l(d2/do?)lnf(z\a)]a=ao}  =  -E  j  '  |  » 


in  the  limit  as  n  approaches  infinity  by  the  strong  law  of  large  numbers. 


as  the  number  of  samples  n  grows  to  infinity 


lim  ky/n(a*  —  aQ)  = 


lim  —=  V'  , 

n^o o  kJn  \  a  a 

2—1  x 


d  In  f(zi  |  a) 


a=aQ 


k2 


lim  Ti  +  lim  ff.O.T 

n— >oo  n— xx) 


lim  a*  = 

n— kx) 


1  A  / din  f(zj\at) 

n^o  ky/E,  da 
V  2—1  X 


lim  kyfn 


lim  —  V] 

77. — k,Zrn  <  J 


—  lim  T\  +  lim  ff.O.T 

n— >oc  n^oo 


lim  aQ 


lim  a *  = 

n— kx) 


n— Kx>  k2n 

*  2=1 


<91n/(^;q) 

(9a 


/c2 


lim  Tf  +  lim  ff.O.T 

72— KX)  n— kx) 


+ 


lim  a*({zj}”)  A/^0, 


/  fcy/nV 
\  k2n  ) 


+  aG  —  A/" 


OLqi 


ky/n 


.  (3.12) 


The  final  steps  in  the  derivation  of  Equation  (13.12|)  requires  an  explanation.  The 
term  lim,,^,^  aQ  in  the  second  line  is  simply  aQ  since  the  true  value  of  the  parameter 
is  not  a  function  of  the  number  of  samples  n.  The  denominator  in  the  third  line  con¬ 
tains  the  term  lim^oo  T\  which  converges  to  — k 2  by  the  strong  law  of  large  numbers, 
condition  (iv),  and  identity  (c).  The  other  term  in  the  denominator,  lirn^oo ff.O.T, 
is  zero  since  these  higher  order  terms  are  the  higher  order  moments  of  £  as  n  — >  oo, 
which  are  finite  by  condition  (v),  and  each  of  these  terms  is  scaled  by  (a*  —  aD)*, 
i  >  2,  which  converges  to  zero  as  n  — >  oo.  Therefore,  the  denominator  converges  to 
~{—k2/k2  +  0)  =  1  with  probability  1.  Finally,  the  central  limit  theorem  states 
that  a  sum  of  n  i.i.d.  random  variables  yi,  with  hnite  mean  E{y}  =  y  and  variance 
E{  (y  ~  y) 2 }  =  cr2,  converges  as  the  number  of  samples  becomes  infinite  to  a  normally 
distributed  random  variable  with  mean  and  variance  parameters  n/j,  and  ncr2,  respec¬ 
tively.  Applying  this  theorem  to  the  third  line  of  the  equation  leads  to  the  conclusion 
that  a *  is  a  normally  distributed  random  variable  with  a  mean  of  zero  (by  identity 
(b))  and  a  variance  of  {ky/n / (k2n))2 .  The  last  line  then  follows  from  the  third  line, 


and  the  MLE  estimate  cC({Aj}i)  is  in  fact  asymptotically  normally  distributed  with 
probability  1. 


Finally,  the  third  asymptotic  quality  of  MLE,  efficiency,  can  be  shown  in  two 
ways.  The  easiest  way  to  show  that  the  MLE  estimate  a*({ Zi }”)  is  asymptotically 
efficient  is  to  note  that  its  variance  is  inversely  proportional  to  the  number  of  samples 
n,  so  as  this  number  increases,  the  variance  decreases.  In  fact,  the  variance  is  zero 
in  the  asymptotic  limit  as  n  — >  oo.  The  second  method  to  show  the  asymptotic 
efficiency  of  the  MLE  estimate  is  to  use  Cramer’s  definition  of  an  asymptotically 


efficient  estimator  [10 


lim  eff(a!*({2i}")) 


1 

(i  !k)2E{{^&A)l=a) 


1 

k2/k2 


(3.13) 


The  efficiency  function  eff(-)  is  used  to  determine  the  efficiency  of  an  estimate.  By 
taking  the  limit  as  n  approaches  infinity  of  the  efficiency  function,  one  may  calculate 
the  asymptotic  efficiency  of  an  estimate.  A  value  of  1  indicates  that  the  estimate  is 
the  most  efficient  estimate  possible.  Therefore,  )  is  the  most  efficient  estimate 

for  a. 


3.1.2  MLE  Measure  Function.  The  method  of  maximum  likelihood  can  be 
related  to  the  problem  of  approximating  one  pdf  with  another,  which  is  the  goal  of 
this  thesis.  Again  consider  the  problem  of  estimating  a  single  pdf  parameter  a  given  a 
set  of  n  i.i.d.  samples.  The  log-likelihood  function  is  given  in  Equation  (3.2)  and  the 
likelihood  equation  is  given  by  Equation  (3.3) .  Since  the  original  (true)  pdf,  f(z\a0), 
is  known,  we  need  to  relate  this  pdf  to  an  approximate  pdf  based  on  the  likelihood 
equation.  To  do  so,  begin  with  the  log-likelihood  equation  and  set  the  derivative  equal 
to  some  number,  c,  when  some  approximated  value  of  the  parameter,  a,  is  input  into 
the  equation.  Next,  multiply  both  sides  of  this  equation  by  1  jn  and  take  the  limit  as 
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n 


oo, 


lim 

n—>oc  n  oa 
i=  1 


=  lim  — c  =  0 

n— kx)  77, 


E<,  / <91n/0|a) 
\  (9a 


=  0. 


This  equation  may  be  rewritten  in  integral  form  as 


'-oo  f(z |d)  \<9a 


<9 


f(z\a] 


f(z\a0)dx  =  0 


(3.14) 


where  f(z\aD)  is  the  true  density  and  f(z\a)  is  the  approximate  pdf.  This  MLE 
measure  function  may  be  used  to  evaluate  the  “fit”  of  f(z\a)  to  f(z\a0). 


3.2  Expectation  Maximization 

Generally,  the  solution  to  the  likelihood  equation  given  by  Equation  (13.31  may 
require  solving  nonlinear  differential  equations  for  the  MLE  of  the  parameter.  In 
particular,  for  the  problem  of  finding  the  parameters  of  a  mixture  density,  the  single 
likelihood  equation  usually  becomes  a  set  of  nonlinear  differential  equations  without 


an  analytic  solution  [26].  Instead,  an  approximate  solution  is  sought  by  an  iterative 
approach  such  as  Newton’s  method  or  some  form  of  this  method,  such  as  Rao  “Scor¬ 
ing”  [221126].  As  an  alternative  to  the  traditional  Newton-like  approaches  to  solving 
the  nonlinear  differential  equations,  the  Expectation  Maximization  (EM)  algorithm 
is  another  iterative  approach  to  solving  for  the  MLE  parameters.  The  EM  algorithm 
offers  a  number  of  desirable  qualities,  including  reliable  convergence  to  the  MLE  of  the 


parameters  and  computational  tractability  [26] .  However,  convergence  to  the  solution 
may  be  slow  even  when  applied  to  relatively  simple  problems  [26] . 

The  EM  algorithm  is  useful  when  applied  to  MLE  problems  in  which  maximizing 
a  “complete  data”  log-likelihood  function  is  easier  than  maximizing  the  observed 
“incomplete  data”  log-likelihood  function  [26] .  An  example  will  be  given  in  Subsection 
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Figure  3.1:  Sample  spaces  for  the  EM  algorithm  derivation. 

13.2.21  to  illustrate  this  point.  The  observations  may  be  viewed  as  incomplete  data  in 
that  they  are  drawn  from  a  sample  space  that  is  mapped  from  a  subset  of  the  complete 
data  sample  space  [TTj,[26] .  To  clarify  this  point,  Figure  3.1  depicts  the  sample  space 
of  the  complete  data,  z,  as  Z  and  the  sample  space  of  the  observed  incomplete  data, 
y,  as  Y.  The  observed  incomplete  data  sample  space  is  actually  the  mapped  image 
of  a  subset  of  the  complete  data  sample  space,  Z(y),  which  is  mapped  according  to 
the  many-to-one  transformation  (i.e.,  non- invertible)  y( z)  pT], 


3.2.1  Theoretical  Derivation  of  the  EM  Algorithm.  The  EM  algorithm  for 
observations  from  an  exponential  family  (e.g.,  Gaussian,  Poisson,  and  Multinomial) 
is  derived  from  a  theoretical  perspective  in  this  section.  This  derivation,  including 


elaboration  on  certain  details,  follows  that  of  Dempster  et  al.  pTT] ,  who  first  drew 
widespread  attention  to  the  algorithm  and  developed  a  generalized  version.  The  final 
result,  Equation  (13.17),  is  a  direct  consequence  of  the  MLE  approach  under  certain 
conditions  and  has  been  noted  in  different  articles  prior  to  the  publication  of  |11] . 

The  pdf  of  the  complete  data  sample  space  is  /(z|a)  and  the  pdf  of  the  observed 
incomplete  data  sample  space  is  p(y|o:),  where  ct  is  the  vector  set  of  parameters  of 
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the  pdfs.  These  densities  are  related  by6 


0(y|«) 


*  2:  not  in 


f(z\cx)dz. 


(3.15) 


Considering  only  exponential  families  of  pdfs  for  possible  candidates  of  /(•),  the  com¬ 
plete  data  pdf  has  the  form 


f(z\a)  = 


b{  z)e 


OiTt(z,) 


a[OL) 


(3.16) 


where  t(z)  represents  the  vector  of  complete  data  sufficient  statistics d,  b( z)  is  a  non¬ 
negative  scalar  function  of  z,  and  a(a. )  is  a  normalization  scalar  [9].  The  aim  of  the 
EM  algorithm  is  to  maximize  g(y|a)  by  appropriate  choice  of  ct  and  by  using  the 
complete  data  pdf  f(z\ct). 


The  derivation  of  the  EM  algorithm  for  exponential  families  of  distributions 
begins  by  noting  that  the  pdf  of  the  complete  data  conditioned  on  the  observed 
incomplete  data  may  be  written  using  conditional  probability  as  |ITlf26] 


f(z\a)  =  /(z|y,a)/(y|a) 


since  y  is  related  to  z  through  the  transformation  y  =  y(z).  To  emphasize  the  distinc¬ 
tion  between  the  pdfs  given  above,  let  /(z|y,  a)  =  fc(z|y,  a)  and  /(y|o:)  =  g(y|ai). 


6In  the  paper  by  Dempster  et  al.,  this  integral  is  over  the  domain  Z(y).  However,  based  on  the 
written  description  in  [11]  and  the  corresponding  graphical  depiction  of  the  sample  spaces  in  Figure 
I3.fi  the  domain  “z  not  in  Z(y)”  appears  to  be  more  appropriate  for  the  purpose  of  this  section. 

'  Heuristically,  a  set  of  observations  from  some  underlying  parent  distribution  is  considered  a  suf¬ 
ficient  statistic  if  it  contains  enough  information  to  make  a  correct  inference  about  certain  properties 
of  the  parent  distribution.  See  Chapter  2  of  [42]  for  a  more  informative  definition. 
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Now,  the  equation  become^8 


/(z|a)  =  k(z\y,  at)g(y\ai), 


where  the  first  term  is 


11 


H  z|y,a) 


b(  z)eaTt(z) 
a(«|y) 


Taking  the  natural  logarithm  of  this  equation  and  rearranging  terms  yields  an 
equation  for  the  log-likelihood  of  g(y|a): 


In  <7 (y | a:)  =  In /(z|a)  -  In  fc(z|y,  a) 

=  lnfe(z)  +  art(z)  —  lna(a)  —  [lnfe(z)  +  a:Tt(z)  —  lna(a|y)] 
=  lna(a|y)  —  lna(a). 

The  a(-)  terms  can  be  expressed  using  the  law  of  total  probability: 


k(z\y,  a)dz 


1 


b(z)eaTt{z)dz 


a(a|y) 


8An  attentive  reader  may  question  why  a(a|y)  is  conditioned  on  y  but  neither  b( z)  nor  t(z) 
have  this  conditioning.  The  answer  to  this  question  lies  in  Figure  l3,li  By  conditioning  on  y,  z  is 
known  to  exist  only  over  the  restricted  sample  space  Z(y),  which  will  be  seen  in  the  subsequent 
paragraphs.  So,  although  the  conditioning  is  not  made  explicit  for  these  terms,  the  conditioning  has 
not  been  neglected.  This  notation  is  consistent  with  that  of  pl|,  so  the  conditioning  is  not  explicitly 
represented  in  the  aforementioned  terms  of  the  above  equation. 
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since  the  conditioning  on  y  implies  z  G  Z(y),  and 


/  /(z|a)dz  =  1 

JZ 


b{z)ecxTt<yZ)dz  =  a(a:). 


In  an  analogous  manner  to  maximizing  the  log-likelihood  function  given  in  Equation 
(13.2),  to  maximize  lng(y|a),  take  the  partial  derivative  with  respect  to  the  parameter 
vector  a  and  equate  to  a  row  vector  of  zeros,  making  the  substitutions  given  in  the 
last  two  results. 


d  d  d 

—  \ng(y\aL)  =  —  lna(a|y)  -  —  lna(a) 


1  da(a  |y)  1  da(a) 


a(a:|y)  da  a(a)  da 


a(a|y) 


[  J^b(z)eaTtiz)dz  - 
My)  da 


a(a 


■^-b(z)eaTt(z)dz 

da 


/  tT(z)fc(z|y,  a)dz  —  /  tT(z)/(z|a)<iz 

Jz(  y)  J  Z 


=  Ez{tT(z)\y,a}  -  Ez{tT(z)\a}  =  07 


(3.17) 


As  with  the  likelihood  equation  given  in  Equation  (3.3l),  the  left-hand  side  of  Equation 
(13.17)  equals  zero  (in  this  case  a  row  vector  of  zeros)  when  a  =  ami,  which  implies 
that  Ez{t(z)\y,ami}  =  Ez{t(z)\aml}. 

The  EM  algorithm  follows  from  Equation  (3.17)  when  it  is  interpreted  as  a 
two-step  process.  First,  the  complete  data  sufficient  statistics  t(z)  are  estimated  by 
t(i)  =  Ez{t(z)|y,  a{  d}  at  iteration  i  of  the  algorithm.  Since  the  estimate  is  the  ex¬ 
pected  value  of  the  sufficient  statistics  conditioned  on  the  incomplete  data  observation 
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vector  y,  this  step  is  referred  to  as  the  “E-step”  or  “expectation  step.”  Next,  find  the 
value  of  ot.  that  maximizes  the  likelihood  by  invoking  the  equality  Ez{t^\cx}  = 
to  obtain  the  next  iteration  of  the  estimated  parameter  vector  o;b+1)  (of  course,  an 
initial  value  must  be  set  for  ai^  for  the  algorithm  to  begin).  This  second  step  in  the 
EM  algorithm  is  referred  to  as  the  “M-step”  or  “maximization  step.” 


3.2.2  EM  Algorithm  Example.  Now  that  the  theory  of  the  EM  algorithm 
has  been  presented,  the  algorithm  will  be  applied  to  a  simple  discrete  random  vari¬ 


able  problem  found  in  [35]  which  is  a  modified  version  of  an  example  given  in  [11 
Consider  an  array  of  five  sensors  with  outputs  {zi,  Z2,  Z3,  £4,  £5}  following  a  Multino¬ 
mial  distribution  with  pmf  /( z|a)n;  this  data  set  and  pdf  correspond  to  the  complete 
data.  Assume  that  the  sensor  array  has  a  malfunction  such  that  the  first  and  second 
sensor  outputs  are  summed.  Then  the  observed  incomplete  data  set  is  {2/1, 2/2, 2/3,  Va} 
(yi  —  z  1  +  Z2,  y2  =  Z3,  2/3  =  Z4,  and  jq  =  Z5 )  with  a  Multinomial  pmf  g(y\a).  Recall 
from  basic  probability  that  the  sum  of  the  realizations  of  the  random  variables  of  a 
Multinomial  distribution  are  constrained  by  the  total  number  of  “trials,”  n.  In  the 
case  of  this  example,  the  sum  of  realizations  from  the  complete  data  distribution  must 
equal  nz  (i.e.,  nz  =  z\  +  Z2  +  Z3  +  Z4  +  Z5).  Likewise,  the  sum  of  realizations  from  the 
incomplete  data  distribution  must  equal  ny  (i.e.,  ny  =  y\  +  y2  +  2/3  +  y 4). 

This  scenario  is  well  suited  for  the  EM  algorithm  since  maximizing  the  complete 
data  pmf,  f(z\a),  is  easier  than  maximizing  the  incomplete  data  pmf,  g(y\a).  To  see 
this  point,  solve  for  the  MLE  using  both  likelihood  functions.  If 


f(z\a)  = 


{z\  +  Z2  +  Z3  +  Z4  +  Z5)!  /  lA  1 
Zi\ z2 \Z3\Z4\ z5\  \2 


a\z  2 
4 


Z3 


%  Z4  ,  v  „ 

a  \  /a\zs 


g(  y|«)  = 


(2/1  +  2/2  +  2/3  +  2/4)!  ( 1 


yi'-yAyAyA 


a 


y  1 


1  a 
4  _  4 


V2 


1 

4  _  4 


a  \  V3  /a\2/4 

I) 


9To  this  point,  the  notation  p(-)  has  been  used  to  represent  a  pmf.  For  this  subsection,  /(•)  and 
g(-)  are  used  to  represent  pmfs  to  stay  consistent  with  the  notation  used  in  Section  [3i2l 
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then  the  solutions  to  the  likelihood  equation  given  in  Equation  (13.31)  for  each  log- 
likelihood  function  (or,  equivalently,  but  with  more  algebra,  each  likelihood  function) 


an 


(Z2  +  z3  +  Z4  +  z^)ami  —  (Z2  +  Z5)  —  0 
(yi  +  V2  +  2/3  +  -  ( yi  -  2y2  -  2 y3  -  t/4)aw  -  2 yA  =  0. 


It  is  evident  that  solving  a  linear  equation  for  ami  using  the  complete  data  likelihood 
function  is  simpler  than  solving  a  quadratic  equation  using  the  incomplete  data  like¬ 
lihood  function.  Therefore,  the  EM  algorithm  is  ideally  suited  for  this  example  since 
it  simplifies  the  MLE  problem  by  introducing  the  complete  data  pmf.  I11  fact,  the 
solutions  to  these  equations  are 


O^ml  — 


^2  +  ^3  +  Z4  +  Z5 

(y  1  ~  2y2  ~  2y3  -  y4)  ±  y/ (y  1  -  2 y2 


_ 2ys  -  y a)2  +  8y4(yi  +  yi  +  y3 

2(1/1  +  y2  +  y.s  + 1/4) 


T  £/4  ) 


Now  assume  that  the  observation  vector  y  =  [yi,  y2,  y3,  y^]7  =  [125, 18,  20,  34]T 
is  available  for  computing  the  MLE  of  a.  First  commence  with  the  E-step  to  es¬ 
timate  the  complete  data  sufficient  statistics  t(z)  =  z  =  [zi,  z2,  z3,  24,  z§]T  using 
th)  —  z(*)  =  Ez{z\y,a^}.  Since  z3  —  y2  —  18,  Z4  =  y3  =  20,  and  z5  =  y4  =  34  are 
given  by  y,  only  z\  and  z2  need  to  be  estimated  (recall  that  Z\  +  z2  —  y\  —  125). 
Referring  to  [9],  the  conditional  pmfs  are  (with  the  observed  values  substituted  for 

10These  expressions  are  easily  obtained  by  simply  taking  the  partial  derivative  with  respect  to  the 
parameter  a  for  each  log-likelihood  function  and  setting  the  results  equal  to  zero. 
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the  sufficient  statistics  where  applicable). 


11 


k(z1,z2\z3,Z4,z5,a 


(*h  — 


1/2 


a 


74 


22 


125! 

zih2!  \l/2  +  a^/A)  \l/2  +  a^!A/ 

21  /  ^(0/4  \  (125-21) 


H~l~  ~  -  *  =  1251  (  V2  VY  “W/4  V 

i1  2)  3^4^5,a  )  ^1(125-^)!  Vl/2  +  a(0/4j  Vl/2  +  a«/4j 


fc(*2kl,*3,*4,^5,a 


(*h  _ 


125!  /  aW/4  VV  1/2  \  (125_Z2) 

£2!(125  —  z2)!  Vi/2  -h  af(i)/4  )  \l/2  +  a»/4 ) 


Note  that  the  quantity  125!  appears  in  the  numerators  above  and  not  197!  since  the 
conditioning  on  z3,  z 4,  z5  limits  the  number  of  trials  of  Z\  and  z2  to  197  —  72  =  125. 
These  conditional  pmfs  are  Binomial  with  a  mean  of  np,  where  n  is  the  number  of 


trials  and  p  is  the  probability  of  each  event  [20] ,  so  the  conditional  expectations  are 


=  125  ■ 


=  125  ■ 


1/2 


l/2  +  a«/4 
a/*7  4 

l/2  +  aW/4' 


(3.19) 


Thus  the  estimated  sufficient  statistics  at  step  %  are  zW  =  [z3\  z%\  18,  20,  34]T  and 
the  E-step  is  finished. 

Before  proceeding  with  the  M-step,  which  calculates  the  next  iteration  of  the 
parameter  estimate  a^l+1\  first  observe  that  Ending  the  value  of  a  that  maximizes 
Ez{ t(z)  |a}  in  Equation  (]3.17|  is  equivalent  to  finding  a  stationary  point  of  ^  In  /( z\a). 


Tn  general, 

f(x  l,X2\x3,Xi,X$)  = 

f(x3,X  4,X5)  = 

/( Xi,X2\x3,X4,X5)  = 


f(xi,X2,X3,X4,X5) 
f(x3,x  4,X5) 
n\ 

x3\x4\x$\{n  —  x3  —  X4  —  X5)! 

(n  -  x3  -  x4  -  £5)!  (  pi 


x3\x2\ 


■px33pTpT (i  -  P3  -  pa  - 

P2 


1  -  P3  ~  Pi  ~  P5  J  V  1  -  P3  ~  Pi  ~  P5 


where  /(•)  is  a  Multinomial  pmf,  n  is  the  number  of  trials,  and  pi  is  the  probability  of  z%  for 
i  =  1,2, 3, 4, 5. 
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This  fact  follows  from 


Ez{t(z)\a}  =  ^-ln/(z|a)  =  0 

d 

=  —  (lnfe(z)  +  at(z)  —  lna(a))  =  0 

da 

=  t(  z) 

which  is  the  equation  used  in  the  M-step.  Therefore,  the  M-step  is  carried  out  by 
applying  MLE  using  the  complete  data  log-likelihood  function  ln/(z|o;)  which  is 

i  q 4 

a^D  =  -  (3.20) 

+  18  +  20  +  34 

(this  is  the  first  of  the  two  MLE  equations  given  in  Equation  (3.18)) . 

Although  the  previous  example  is  relatively  simple,  it  illustrates  the  main  virtue 
of  the  EM  algorithm.  Notice  that  the  MLE  using  the  incomplete  data  log-likelihood 
function  lng(y|a;)  results  in  solving  a  quadratic  equation  (given  in  the  second  line  of 
Equation  (3.18))  while  the  corresponding  equation  for  the  complete  data  log-likelihood 
function  is  linear  (the  first  line  of  Equation  (3.18j)).  The  algorithm  proceeds  by  first 
computing  the  E-step,  Equation  (3. 19).  followed  by  maximization  of  the  likelihood 
function  in  the  M-step,  Equation  (3.20).  These  steps  continue  until  the  difference 
between  the  the  most  recent  successive  parameter  iterative  estimates  reaches  a  desired 
value. 


3.2.3  General  Form  of  the  EM  Algorithm.  Although  Equation  (3.17)  was 
introduced  as  the  EM  algorithm,  this  equation  is  only  valid  for  exponential  families 
of  pdfs.  A  more  general  form  of  the  algorithm  is  given  in  pTT]  which  is  introduced  in 
this  subsection  as  a  bridge  to  the  notation  used  in  the  next  section.  This  form  also 
enables  combining  the  E-  and  M-steps.  The  general  form  of  the  EM  algorithm  defines 
the  E-step  as 

Q(a',  oiw)  =  E  {ln/(z|a')|y,aw} 
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and  the  M-step  as 


oSl+l^  =  argmax  Q(ct',  a^). 

a' 

The  form  of  these  steps  makes  it  possible  to  combine  the  E-  and  M-steps  conveniently 
into  a  single  step. 

As  an  illustration,  the  general  form  of  the  EM  algorithm  may  be  applied  to  the 
example  in  the  previous  section  with  the  E-  and  M-steps  combined: 


d_ 

da' 


Q(a>«) 


_d_ 

da' 


£  In 


( Z\  +  Zi  +  18  +  20  +  34)! 


zi!z2!18!20!34! 


a(i)  \+E{Zl |a«}ln(l/2) 


E{z2  ln(|)  +  381n(4-A)  +  341n(^) 


dO 

0  +  0  +  ^ 
a 


38 


1  —  a1 


34 


Solving  this  equation  for  a'  leads  to  the  next  iteration  of  the  estimated  parameter 
a^l+1\ 

a(m)  =  ^ }  +  34 

4°  +  18  +  20  +  34’ 

which  is  the  same  as  Equation  (I3.20j). 


3.3  Multivariate  Gaussian  Mixture  Estimation 

This  section  applies  the  theory  behind  MLE  and  the  EM  algorithm  to  the  prob¬ 
lem  of  estimating  a  multivariate  Gaussian  mixture  pdf.  This  problem  is  motivated 
by  the  goal  of  the  thesis,  which  is  to  represent  one  Gaussian  mixture  with  another 
containing  a  reduced  number  of  components.  Although  this  section  pertains  to  es¬ 
timating  a  multivariate  Gaussian  mixture  pdf,  the  techniques  involved  may  provide 
valuable  insight  into  approximating  one  Gaussian  mixture  with  another.  The  asymp¬ 
totic  representation  of  the  estimates  are  of  particular  interest  since  they  relate  the 
true,  full-component  mixture  to  the  approximated,  lower-component  mixture. 
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In  [26],  Redner  and  Homer  derive  the  MLE  and  EM  algorithm  equations  for 
estimating  the  mean,  covariance,  and  mixture  weights  of  a  multivariate  Gaussian 
mixture  pdf.  Their  derivation  includes  the  constraint  that  the  mixture  weights  are 
non-negative  and  sum  to  one,  but  the  authors  acknowledge  that  the  constraints  on  the 
covariance  being  symmetric  and  positive  definite  are  not  explicitly  imposed.  However, 
the  forms  of  the  solutions  are  claimed  to  uphold  the  qualities  of  the  initial  quantities: 
if  the  initial  mixture  weights  are  non-negative  and  sum  to  one  then  so  do  the  estimated 
mixture  weights.  Likewise,  if  the  initial  covariance  is  positive  definite  and  symmetric, 
then  the  estimated  covariance  has  the  same  properties  (see  pp.  217-218  of  [26]). 

When  reading  the  following  sections,  one  should  keep  in  mind  the  context  of 
the  derivations  of  the  parameter  estimation  equations.  First,  the  parameters  to  be 
estimated  are  from  the  multivariate  Gaussian  mixture  pdf  given  in  Equation  (2.23). 
The  resulting  likelihood  function,  provided  N  i.i.d.  sample  vector  observations,  is 

N 

£({*}?;«)  =  n^z'ln) 

2—1 
N  M 

=  (3.21) 

i-  1  i-1 

Again,  it  is  emphasized  that  { z* =  {zi, . . .  ,zN}  is  a  vector  set  of  i.i.d.  observa¬ 
tions  in  which  each  vector  in  the  set  has  dimension  m  (see  Equation  (2.24));  that  is, 
Zi  =  [zkii  •  •  • ,  Zkm]1  for  i  =  1, . . . ,  N.  Also  note  that  the  constraints  on  the  param¬ 
eters  are  the  same  as  those  mentioned  in  Section  12.31  Second,  the  MLE  approach 
will  maximize  this  equation  only  using  the  constraint  that  the  mixture  weights  are 
non-negative  and  sum  to  one.  Third,  for  the  EM  algorithm  method  to  be  used,  Red¬ 
ner  and  Homer  suppose  that  the  sample  observations  are  incomplete  in  that  they  are 
not  “labeled”  (it  is  unknown  which  mixture  component  spawned  which  samples)  and 
propose  a  complete  data  sample  y  ■  =  (zi:  lL )  where  Ik  is  the  vector  containing  the 
labels  for  each  zuv  (i.e. ,  for  each  component  sample  of  each  vector  set  of  observations). 
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3.3.1  MLE  of  a  Multivariate  Gaussian  Mixture.  The  MLE  of  the  parameters 


of  a  multivariate  Gaussian  mixture  log-likelihood  function, 

N 

lnL({zi}f;fi)  =  JUn/Mn) 

2—1 

N  M 

i=  1  3= 1 

are  the  solutions  to  the  likelihood  equations, 


V^ln  L({Zi}?  ;fl) 


n=n„ 


=  o, 


in  the  unconstrained  optimization  case  12 .  After  applying  the  constraint  on  the  mix¬ 
ture  weights,  the  MLE  of  this  parameter  is  given  by  (with  the  ml  subscript  sup¬ 
pressed)  [26] 13 

v  3>  (3.22) 


for  j  —  1, ,  M.  Next,  the  MLE  of  the  mixture  component  means  and  covariances 
are  found  as  the  solutions  to  the  unconstrained  likelihood  equations 


Vu  .ln  £({*}?;  O) 


=  0 


n=n„ 


VpTn  L({z,}f;f2) 


=  0 


n=n„ 


12In  general,  the  vector  derivative  of  a  scalar  function  dependent  on  the  components  of  the  deriva¬ 
tive  is  given  by  [35] 

dxi J 


VxF(x)  = 


FLF{x) 


j 


13One  troubling  aspect  of  this  equation  is  that  it  is  in  terms  of  itself!  That  is,  the  mixture  weight 
appears  on  both  sides  of  the  equation.  This  apparent  flaw  is  not  noted  in  [26].  Later,  the  MLE 
equations  will  be  set  aside  in  favor  of  the  EM  algorithm  equations,  so  this  apparent  mathematical 
contradiction  is  avoided. 
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for  j  =  1, . . . ,  M.  These  solutions  an 


N 


Zi& 


Vi  = 


i=  1 


f{*i  in) 


N 


(3.23) 


f(zi\n) 


p,  = 


i—  1 

N 

^{Zi  -  -  Vjfpj 

i=  1 


nzi\iij,p„ 


f(zi  |n) 


X  -  „  f(z,  flj.  I’J 

for  j  =  1, . . . ,  M. 

The  solutions  for  the  MLE  of  the  mixture  weights,  means,  and  covariances  of  a 
multivariate  Gaussian  mixture  have  some  nice  properties.  Equation  (3.22)  shows  that 
each  pj  will  be  non-negative  since  all  of  the  quantities  are  non-negative,  including  the 
pdfs,  and  that  the  set  sums  to  one.  Furthermore,  it  can  be  seen  that  the  covariance  pa¬ 
rameter  solution,  Equation  (3.23).  is  a  sum  of  symmetric,  rank  one  matrices  scaled  by 


positive  numbers  which  will  produce  a  symmetric,  positive  semi-definite  matrix  [34 
However,  Redner  and  Homer  claim  that  the  covariance  solution  is  positive  definite 
as  long  as  the  initial  covariance  estimate  is  positive  definite  [26].  The  authors’  claim 
may  be  substantiated  in  practice  since  the  possibility  of  a  sample  vector  z^,  equalling 
fij  (thus  making  the  covariance  parameter  solution  positive  semi-definite)  is  unlikely. 
These  properties  meet  the  constraints  given  above  so  that  the  parameters  estimated 
by  Equations  (13.221)  and  (13.23)  will  result  in  a  valid  multivariate  Gaussian  mixture 
solution. 


14To  derive  these  results  apply  the  dell  operator  to  the  unconstrained  likelihood  equations,  use  the 
chain  and  product  rules,  use  the  vector  and  matrix  derivative  identities  [81(35] 

(a)  V xxT Ax  =  2 Ax 

(b)  V j±xT A~xx  =  —  A~t xxT A~t 

(c)  det  A  =  (det  A)A~T 
and  solve  for  the  parameters. 
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Although  the  above  estimates  appear  to  meet  the  requisite  constraints  on  the 
parameters,  they  possess  at  least  one  troubling  property  -  each  solution  is  in  terms 
of  itself.  In  Equation  (]3.22j)  the  p3  terms  cancel  so  that  an  estimate  cannot  be  made. 
Also,  the  mean  and  covariance  solutions  in  Equation  (13.231)  are  complicated  non-linear 
relations  of  the  parameters  to  be  estimated. 

One  means  of  circumventing  this  problem  (which  was  introduced  in  Section  13721) 
is  to  implement  the  EM  algorithm.  By  doing  so,  the  estimation  equations  remain 
the  same  but  the  estimates  are  now  iterative  and  thus  in  terms  of  the  future  and 
current  estimates.  This  simplification  is  a  direct  result  of  applying  the  EM  algorithm 
to  an  MLE  problem  where  maximizing  the  likelihood  function  that  is  produced  by 
the  algorithm  is  easier  than  maximizing  the  MLE  likelihood  function. 

3.3.2  EM  Algorithm  for  a  Multivariate  Gaussian  Mixture.  As  mentioned 
above,  the  EM  algorithm  can  simplify  the  MLE  solution  for  the  parameters  of  a 
multivariate  Gaussian  mixture  if  the  estimation  problem  can  be  posed  in  terms  of 
complete  and  incomplete  data  (see  Section  13121).  In  this  instance,  the  simplification 
is  not  the  form  of  the  parameters  estimate  solutions  in  Equations  (13.221)  and  (13.231) . 
but  the  iterative  nature  of  the  EM  algorithm. 

The  EM  algorithm  is  derived  by  modifying  the  MLE  problem  according  to  the 
developments  in  Section  13121  and  using  the  general  form  of  the  EM  algorithm  given  in 
Subsection  13.2.31  rather  than  the  result  given  for  exponential  families  of  distributions. 
To  begin,  specify  the  complete  data  set  yi  =  ( Zi ,  If)  where  is  the  vector  containing 
the  labels  for  each  zuv  as  mentioned  in  the  beginning  of  this  section.  The  complete, 
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incomplete,  and  conditional  likelihood  functions  are  then 


/({yJF  l«) 


k({yi}?\{Zi}?,n) 


N 


Yh’ilfiZ;  /*v  !>;,) 
2=1 
N 


I/(*i|n) 


/({kKTO 


Applying  the  E-step  yields  [26]  (s  is  the  iteration  index) 


<?(«',  n<*>)  =  £{in/({y,}f|sr)|{zi}f ,  n"’} 


M 

£ 

t=i 


2—1 


/(^|n(s)) 


M  N 


ln^  +  Z^Z^ln/(z>i’P 

J=1  i=l 


/(^|o(s)) 


and  applying  the  M-stepcy  produces  the  EM  algorithm  solutions  for  the  parameters: 


N 


E<z‘  -  '* 


(s+i))(z,-M<i+i))TPy 


f(z. 


Iz4s),pSs)) 


2—1 


TV 


2—1 


/(z^y.py) 


(3.24) 


15To  obtain  this  result,  use  the  constrained  maximization  for  the  mixture  weights  and  the  uncon¬ 
strained  maximization  for  the  means  and  covariances  as  in  Subsection  [3.3.11  The  derivatives  should 
be  with  respect  to  the  primed  variables. 
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for  j  —  1, ,  M.  Note  that  all  of  the  properties  mentioned  in  Subsection  13.3.1  apply 
to  these  solutions,  but  since  the  estimates  are  iterative,  they  are  not  in  terms  of 
themselves  as  with  the  MLE  solutions. 

3.3.3  Asymptotic  Representation  of  the  EM  Algorithm.  The  EM  algorithm 
estimates  for  the  parameters  of  a  multivariate  Gaussian  mixture  can  be  asymptotically 
extended  to  include  the  true  mixture  pdf.  This  action  is  motivated  by  finding  a 
method  of  comparing  a  full-component  multivariate  Gaussian  mixture  with  one  having 
a  reduced  number  of  components.  To  see  this  point,  take  the  limit  as  the  number  of 
sample  vectors  approaches  infinity  and  invoke  the  strong  law  of  large  numbers.  In  the 
equations  that  follow,  Qa  is  the  full-component  or  true  value  of  the  parameter  set,  N 
is  the  number  of  sample  vectors,  and  a.s.  means  almost  surely  as  previously  defined. 


lim  n(s+1)  =  p*’(s+1) 


Z  GZ 


/ 


*,(s-)-l)  a.s.  ZeZj 


/ 


Z£r£. 


(3.25) 


a.s.  Zez 


for  j  =  1, . . . ,  M. 


106 


Of  particular  interest  is  the  relationship  between  the  asymptotic  EM  Equations 
(13.25T)  and  the  MLE  measure  function  given  by  Equation  (13.14).  If  the  MLE  measure 
function  is  written  in  terms  of  a  multivariate  Gaussian  mixture,  then  it  has  the  form 

f  TaV  f(z\^o)dz  —  0.  (3.26) 

zeznZl  } 

By  setting  Vq  =  { V fj,  .  V ry }  and  solving  for  the  mean  and  covariance  parameters, 
respectively,  one  obtains  the  corresponding  equations  in  Equation  (13.251). 

3.4  Summary 

This  chapter  explored  sample  observation-based  pdf  estimation  using  MLE  and 
an  iterative  implementation  of  MLE  called  the  EM  algorithm  to  gain  insight  into 
possible  methods  for  approximating  a  multivariate  Gaussian  mixture  pdf  with  one 
containing  a  lower  number  of  mixture  components.  Three  asymptotic  qualities  of 
MLE  (convergence  to  the  true  parameter  value,  convergence  to  a  Gaussian  distri¬ 
bution,  and  efficiency)  were  derived  to  emphasize  the  effectiveness  of  MLE  as  a  pdf 
estimation  technique.  The  asymptotic  nature  of  MLE  is  important  because  the  MLE 
measure  function  derived  in  Equation  (13.26)  could  be  used  to  discriminate  between 
a  full-component  target  state  Gaussian  mixture  pdf  and  an  approximate  reduced- 
component  mixture  pdf.  The  EM  algorithm  makes  use  of  complete  data  to  simplify 
the  MLE  problem  as  illustrated  by  the  example  provided  in  Subsection  3.2.2  in  which 
a  quadratic  solution  equation  was  replaced  by  a  linear  solution  equation.  This  al¬ 
gorithm  was  applied  to  the  problem  of  estimating  the  parameters  of  a  multivariate 
Gaussian  mixture  pdf  when  provided  with  sample  vector  observations  in  Subsection 
3.3.21  It  may  be  possible  to  use  this  approach  to  produce  an  approximation  to  a 
multivariate  Gaussian  mixture  pdf  by  generating  samples  from  the  original  pdf  and 
implementing  Equation  (13.24)  to  estimate  the  parameters  of  the  reduced-component 
mixture  pdf. 
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IV.  Approximating  Gaussian  Mixtures  &;  Mixture 
Reduction  Algorithms 

In  the  context  of  the  tracking  with  measurement  origin  uncertainty  problem  outlined 
in  Chapter  II,  the  Bayesian  solution  to  the  problem  results  in  a  Gaussian  mixture 
representation  of  the  target  state  pdf.  When  new  measurements  are  received  at  each 
scan,  the  number  of  mixture  components,  which  represent  hypotheses  about  each 
measurement  in  the  entire  measurement  history  with  regard  to  the  overall  target 
state,  increases.  The  rate  of  increase  is  usually  exponential  and  the  tracking  problem 
quickly  becomes  computationally  intractable. 


Approximating  the  full-component  target  state  Gaussian  mixture  pdf  at  the  end 
of  every  measurement  processing  cycle  is  one  means  of  remedying  this  problem.  There 
are  generally  two  types  of  mixture  approximations  that  are  used  in  practice.  The  first 
method  of  approximation  is  to  reduce  the  mixture  to  a  single  component,  such  as  in 
the  PDA  and  JPDA  algorithms  |30I I3T1  [38] .  However,  this  method  is  a  rather  crude 
approximation  to  the  original  mixture,  and  it  ignores  well-spaced  mixture  components, 


potentially  losing  valuable  information  at  the  end  of  the  scan  cycle  [30].  The  second 
approach  approximates  the  full-component  mixture  pdf  with  one  containing  a  lower 
number  of  components,  as  in  Williams’  recently  developed  Integral  Square  Error  (ISE) 
cost-function-based  mixture  reduction  algorithm  |38l IA01 141]  and  others 


The  focus  of  this  chapter  is  approximating  a  full-component  target  state  Gaus¬ 
sian  mixture  pdf  with  one  having  a  lower  number  of  components  based  on  some 
mathematical  measure.  This  type  of  approximation  reduces  the  number  of  mixture 
components  by  either  merging  or  deleting  existing  ones  based  on  the  measure  of  each 
action.  Section  14. Ill  introduces  the  various  measures  used  to  indicate  the  goodness  of 
fit  of  a  low-order  approximate  Gaussian  mixture  to  the  original  target  state  Gaussian 
mixture  pdf.  Next,  Section  [4.21  presents  two  heuristic  algorithms,  the  Greedy  algo¬ 
rithms,  for  choosing  the  “best”  mixture  reduction  actions  from  a  pool  of  proposed 
reductions  based  on  the  output  of  a  measure  function.  Sections  14131  and  K3I  cover  two 
existing  mixture  reduction  algorithms  (MRAs),  Salmond’s  Joining  and  Clustering  al- 
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gorithms  and  Williams’  ISE  cost-function-based  algorithm,  respectively,  for  optimally 
reducing  the  number  of  mixture  components. 


4-1  Measure  Functions  for  Gaussian  Mixture  Approximation 

In  the  context  of  this  thesis,  a  measure  function  used  for  Gaussian  mixture 
approximation  may  be  categorized  as  either  a  true  distance  measure  or  a  pseudo- 
distance  measure.  The  distinction  between  the  two  classes  of  distance  measures  is 
purely  mathematical,  since  the  second  type  of  measure  function  does  not  satisfy  the 
triangle  inequality^,  but  maintains  the  non-negative  and,  in  some  cases,  the  symmetry 
properties  of  a  true  distance.  A  measure  function  is  applied  to  Gaussian  mixture  pdf 
approximation  as  a  means  of  discriminating  between  two  pdfs,  such  as  a  Gaussian 
mixture  pdf  and  a  reduced-component  approximation  of  the  same.  In  this  capacity, 
a  measure  function  provides  a  criterion  for  mixture  reduction  decisions  and  is  a  key 
element  of  an  MRA. 

All  of  the  distance  measure  functions  presented  in  this  section  exhibit  at  least 
one  of  the  three  properties  of  a  true  distance  measure: 


1.  Symmetry 

2.  Satisfying  the  triangle  inequality 

3.  Non-negativeness. 

Symmetry,  as  it  pertains  to  distance  measures,  means  that  the  distance  between  two 
pdfs  is  independent  of  the  order  in  which  the  distance  is  calculated.  For  instance, 
suppose  that  there  are  two  vectors  a  and  b  which  originate  from  a  common  point 

1  The  triangle  inequality  states  that  the  length  of  the  difference  vector  between  two  vectors  a  and 
b  is  less  than  or  equal  to  the  sum  of  the  lengths  of  the  two  vectors  [lO] .  That  is, 

\\a-b\\<\\a\\  +  \\b\\. 

Another  form  of  the  triangle  inequality  is  [34] 

||a  +  6||<||a[[  +  ||6[|. 
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Figure  4.1:  Two  vectors  having  a  common  origin  in  the  space  in  which  they  exist. 


(such  as  the  origin  of  the  space  in  which  they  exist)  as  shown  in  Figure  14.11  One 
possible  symmetric  distance  measure  between  the  two  vectors  is  the  norm  of  the  dif¬ 
ference  vector,  since  ||a  —  6||  =  || b  —  a||.  If  a  measure  function  satisfies  the  triangle 


inequality ,  then  it  is  a  candidate  as  a  true  distance  measure  [17].  To  write  a  more 
profound  statement  about  this  property  would  be  unwise,  given  the  limited  mathe¬ 
matical  background  of  this  author.  However,  the  triangle  inequality  imposes  a  nice 
physical  constraint  of  a  distance  since  the  “real-world”  concept  of  distance  also  has 
this  property2].  Non-negativeness  imposes  another  attractive  physical  constraint  on  a 
true  distance  measure  because  it  ensures  that  the  measured  distance  in  never  negative 
(although  it  may  be  zero). 


Throughout  this  section,  Gaussian  mixture  pdfs  may  be  thought  of  as  infinite¬ 
dimensional  vectors  in  Hilbert  space.  The  length  of  a  pdf  in  Hilbert  space  is  mathe¬ 
matically  defined  as  the  square  root  of  the  inner  product  of  the  pdf  with  itself  (the 
norm  as  defined  in  Hilbert  space )  [37].  If  the  original  multivariate  Gaussian  mixture 
pdf  given  its  weight,  mean,  and  covariance  parameters  is  represented  by  f(x\flQ), 


2  A  basic  “real-world”  example  of  the  triangle  inequality  is  to  measure  the  length  of  two  sticks, 
and  then  “add”  the  two  sticks  (using  the  second  form  of  the  triangle  inequality )  by  placing  them 
end  to  end  and  measure  their  combined  length.  Common  sense  dictates  that  the  combined  length 
of  the  sticks  not  exceed  the  separate  length  of  each  stick. 
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then  its  length  is 


ll/MO„)||  4  V(/WSU/M0,)} 

/OO 

f2(x\ft0)dx 

-OO 

where  (•,  •)  is  the  inner  product  notation.  Similarly,  the  length  of  a  reduced-component 
approximation  of  a  multivariate  Gaussian  mixture  pdf  given  its  weight,  mean,  and 
covariance  parameters  is  represented  by 

wf^m  = 


In  both  of  these  equations,  ft  represents  the  weight,  mean,  and  covariance  parameters 
of  the  mixtures,  and  the  subscript  o  or  “hat”  notation  indicates  that  the  parameters 
are  for  the  original  mixture  or  the  approximate  mixture,  respectively. 

4-1.1  True  Distance  Measures.  A  true  distance  measure  satisfies  the  three 
properties  of  a  true  distance  introduced  in  Section  14.11  The  following  true  distance 
measures  are  presented  in  this  subsection: 

•  Kolmogorov  Variational  Distance  ( Li  Distance,  Total  Variation  Distance) 

•  Integral  Square  Error  cost  function 

•  Hellinger  Distance 

•  Correlation  Measure 

•  Hellinger  Affinity  Measure  (Bhattacharyya  coefficient). 

The  first  three  true  distance  measures  may  be  thought  of  geometrically  as  measuring 
length  of  the  error  vector  between  some  function  of  two  pdfs,  while  the  last  two  true 


</(*|n),/(*|n)) 


f2(x\tl)dx. 


' -OO 
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distance  measures  may  be  viewed  as  measuring  the  cosine  of  the  angle  between  some 
function  of  two  pdfs  (as  in  Figure  RLIp. 


4- 1.1.1  Kolmogorov  Variational  Distance.  The  Kolmogorov  Varia¬ 
tional  Distance  (or  L\  Distance)  is  given  by  (with  the  notation  T  indicating  that  this 
measure  is  a  true  distance  measure)  28||38] 

/oo 

\f(x\no)-f(x\Cl)\dx.  (4.1) 

-00 


The  Kolmogorov  Variational  Distance  may  be  viewed  as  the  sum  of  the  absolute 
differences  between  the  uncountably  infinite,  infinitesimally  small  elements  of  the  two 
pdfs.  This  measure  clearly  adheres  to  properties  1  and  3  of  a  true  distance  measure 
because 

TK{f(x |00),  f(x\Q)}  =  TK{f(x\Cl ),  f(x 1 0,)} 


and  the  absolute  value  function  ensures  that  the  distance  is  non-negative.  To  show 
that  this  measure  also  meets  the  triangle  inequality ,  let  TK{f{x\Q0),  0}  represent  ||a|| 
and  TK{f{x | O) ,  0}  represent  ||6||3.  Then,  TK{f{x\Vt0 )  —  0,  f(x\tl)  —  0}  represents 
|| a  —  6||,  and  the  triangle  inequality  is  satisfied  since 


TK{f{x\no)-Q,f{x\Q)-Q}  = 


\f(x\Q0)  -  f(x\Ct)\dx 


'  -00 


< 


|/(*|fi0)  —  0|  dx  + 


\f(x\Vl)  —  0|  dx 


=  TK{f(x\no),0}  +  TK{f(x\(l),0}. 


Equation  (14. lj)  seems  difficult  to  evaluate  without  approximation.  For  instance, 
the  absolute  value  breaks  the  integral  into  two  (not  necessarily  contiguous)  pieces: 

3Note  that  the  distance  between  a  non-zero  vector  and  the  zero  vector  is  simply  the  length  of  the 
non-zero  vector  using  the  given  distance  measure. 
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one  in  which  /(ic|f20)  >  f(x\Cl)  and  the  other  in  which  f(x |00)  <  f(x\Q).  Since 
the  pdfs  are  Gaussian,  the  integral  over  a  portion  of  the  domain  of  the  pdf  would 
be  extremely  difficult  to  evaluate  without  some  sort  of  approximation  [38]  such  as 
a  numerical  approximation  of  an  integral.  Assuming  that  such  an  approximation  is 
computationally  expensive  relative  to  the  update  time  of  the  tracking  system,  this 
measure  function  appears  unsuitable  for  practical  real-time  implementation. 


4- 1.1.2  Integral  Square  Error  Cost  Function.  Williams’  Integral  Square 
Error  (ISE)  cost  function  demonstrated  the  best  performance  against  a  single  target 
in  heavy  clutter  tracking  when  implemented  in  an  MRA  for  a  Bayesian  tracking  al¬ 


gorithm  in  the  presence  of  measurement  origin  uncertainty  [38].  His  cost  function 

is 


TISE{f(x\n0)j(x\n)} 


{f{x\n0)-f{x\ti),f{x\n0)-f{x\ti)) 


f(x\n0)-f(x\n) 


'  -00  L 


dx 


(4.2) 


-OO 


f2{x\ct0)  +  f(x  |n)  -  2f(x\n0)f(x\Q) 


' -o o  L 


dx 


where  the  first  term  in  the  last  line  of  this  equation  is  the  original  mixture  self-likeness 
term,  the  second  term  is  the  reduced  mixture  self-likeness  term,  and  the  third  term  is 
the  cross-likeness  term  [38],[40],|41] .  The  ISE  cost  function  is  a  true  distance  measure 
since  it  meets  the  requisite  properties  listed  in  Section  ll4.lt  Properties  1  and  2  hold 
because  interchanging  the  pdfs  does  not  change  the  measure  (it  is  symmetrical)  and 
the  squaring  function  insures  the  distance  measure  is  non-negative.  It  satisfies  the 
triangle  inequality  since  (using  similar  representations  for  ||a||,  ||6||,  and  ||a  —  6||  as 
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before) 


TISE{f(x\n0)-o,f(x\n)-o}  = 


f(x\n0)-f(x\n) 


dx 


~oc  r  r oo 

f2/  n 


'  -oc 


f  (x\flQ)  +  f  (x\ ft)  dx—  2f(x\ft0)f(x\fl)dx 

J -OC 


< 


f2{x  |n0)  +  f2(x\h) 


dx 


rOO  pOO  r  2 

/  [f(x\Qo)  -  0}2  dx  +  /  /(a;|f2)-0 

'  —OC  J  -OC  L 


G?£C 


=  T/ss{/(*|fio),  0}  +  TISE{f(x\n),  0}. 


Unlike  the  Kolmogorov  Variational  Distance,  an  exact  closed  form  solution  exists 
for  the  1SE  cost  function  when  the  pdfs  are  multivariate  Gaussian  mixtures,  so  this 
true  distance  measure  is  well-suited  for  real-time  application  as  the  reduction  decision 
criterion  in  an  MRA  (381B01I4I].  By  observing  that  the  first  line  in  Equation  (14.2) 
is  the  squared  length  of  the  point-by-point  difference  between  two  pdfs,  the  ISE  cost 
function  may  be  interpreted  using  the  Hilbert  space  vector  analogy  as  the  squared 
length  of  the  error  vector  in  Figure  14.11  From  this  perspective,  it  should  not  be 
surprising  that  the  ISE  cost  function  has  been  shown  to  provide  the  best  single  target 
in  heavy  clutter  tracking  performance  to  date  when  implemented  as  the  MRA  for  a 
Bayesian  tracking  in  clutter  algorithm  [381 301 31]  ■  F°r  the  same  area  of  discrepancy 
between  the  two  densities,  /(cc|00)  and  /(cc|Si),  a  high  narrow  area  of  discrepancy 
is  weighted  more  severely  by  this  cost  measure  than  a  low  broad  discrepancy  region, 


unlike  the  case  of  using  the  Kolmogorov  Variational  Distance  [38  . 
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It  is  interesting  to  point  out  that  the  cross-likeness  term  by  itself  is  used  as 


a  measure  in  [15].  In  this  paper,  the  authors  suggest  that  the  Expected  Likelihood 
Kernel  could  be  used  for  discrimination  between  two  pdfs.  This  measure  is  (where  M 
is  used  to  distinguish  this  measure  from  a  true  distance  measure) 


r>00 


ME{f(x\n0),f(x\n)}  = 


f(x\Q0)f(x\Cl)dx 


(4.3) 


which  is  one-half  of  the  cross-likeness  term  in  the  ISE  cost  function,  Equation  (4.2j) . 
Although  this  measure  has  an  exact  closed-form  solution,  it  is  not  immediately  clear 
how  to  use  this  measure  to  make  mixture  reduction  decisions4].  For  instance,  if  it 
is  used  as  a  measure  of  the  distance  between  a  pdf  and  an  approximation  of  this 
pdf,  one  would  like  to  select  the  approximation  that  produces  the  smallest  value  of 
Equation  (4.3)  (i.e. ,  the  smallest  distance).  However,  the  concept  of  orthogonality 
in  mathematics  would  indicate  that  if  Equation  (14 .3)  evaluates  to  zero  for  two  pdfs, 
then  the  pdfs  are  very  dissimilar.  So,  small  values  for  this  measure  would  mean  that 
the  two  pdfs  are  not  similar.  For  this  reason,  the  Expected  Likelihood  Kernel  is  not 
categorized  as  a  distance  measure,  nor  is  it  clear  how  to  apply  this  measure  to  mixture 
reduction5. 


4- 1.1.3  Hellinger  Distance.  The  Hcllinger  Distance  is  similar  to  the 
ISE  cost  function,  except  it  operates  on  the  square  root  of  the  pdfs  rather  than 
directly  on  the  pdfs.  Several  variations  of  the  Hellinger  Distance  are  found  in  the 


4A  more  detailed  discussion  of  the  suitability  of  the  Expected  Likelihood  Kernel  as  an  appropriate 
measure  function  is  provided  in  Section  I57TI 

5 Although  the  Expected  Likelihood  Kernel  appears  as  a  component  in  the  ISE  cost  function  (it 
is  the  cross-likeness  term)  and  the  Correlation  Measure  (presented  in  Subsection  14.1 .174]).  it  is  not 
conceptually  related  to  either  of  these  true  distance  measure  functions,  and  outputs  of  the  Expected 
Likelihood  Kernel  do  not  have  the  physical  interpretation  of  a  distance. 


115 


literature  pT5],[24],[25] ,  and  in  this  thesis,  the  Hellinger  Distance  is  defined  as  [24] 


’H 


{/(*  |00),  f{x\Cl)}  =  J  ■  ^  y/f(x  |n0)  -  ^  f(x\Q),  y/f(x\no)  -  \J  f(x  |o) 


'I  [x 

2  I  ^ 


y/fWfij-y/f(x\n) 


dx 


=  4  1 


/(x|120)/(x|S})dx. 


(4.4) 


Given  the  similarity  between  the  Hellinger  Distance  and  ISE  cost  function,  it  is  readily 
apparent  that  the  Hellinger  Distance  meets  the  first  and  third  requirements  of  a  true 
distance.  As  expected,  it  also  satisfies  the  triangle  inequality. 


TH{f(x\Q.0)  —  0,  f(x\Ct)  —  0}  = 


'1 


'  -OO 


vTW nj-\Jf(x\n) 


dx 


r*0 O 


=  4/1 


f(x\fl0)f(x\tl)dx 


<  1 


=  f(x\^o)dx  +  ^J  f(x\n)dx 

=  TH{f(x\n0),o}  +  TH{f(x\n),o}. 

Equation  (14.41)  shows  that  the  Hellinger  Distance  has  the  potential  to  be  a 
promising  alternative  to  the  ISE  cost  function  since  it  does  not  require  the  compu¬ 
tation  of  the  two  self-likeness  terms  in  Equation  (14.21).  However,  the  integral  of  the 
square  root  of  a  product  of  multivariate  Gaussian  mixture  pdfs  is  extremely  difficult 
to  evaluate  exactly  in  closed  form  [151 138].  Placing  numerical  evaluation  aside,  the 
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Hellinger  Distance  could  be  approximated  in  closed  form  using  a  truncated  binomial 
series  or  the  “heuristic  approximation”  of  replacing  the  square  root  of  a  sum  with  a 


sum  of  square  roots  proposed  in  [15 


4.1.14  Correlation  Measure.  The  Correlation  Measure  is  the  first 
of  two  true  distance  measures  which  may  conceptually  be  thought  of  as  calculating 
the  cosine  of  the  angle  between  the  original  and  approximate  reduced-component 
Gaussian  mixture  pdfs  as  if  they  were  vectors  in  Hilbert  space  (see  Figure  14TIT)r 
written  as  [T9] 

</(*|n0),/(*|n)> 


.  It  is 


Tc{f{x\n0),f{x\n)}  = 


(f(x\n0)j(x\n0)){f(x\fi)j{x\n)) 


-00 


f(x\Q0)f(x\Q)dx 


-OO 


r»00  r  OO 

f2(x\Ct0)dx  /  f2(x\Q,)dx 

' -OO  J -OO 


(4.5) 


This  measure  is  used  in  the  held  of  communication  systems  as  a  means  of  de¬ 
termining  the  similarity  between  two  signals  [19].  Again,  using  a  vector  analogy,  one 
can  easily  see  that  if  two  signals  (vectors)  are  the  same,  then  the  angle  between  the 
two  signals  (vectors)  is  zero  and  the  cosine  of  this  angle  is  one.  If  the  two  signals 
(vectors)  are  perpendicular,  then  they  are  considered  “indifferent”  to  each  other  or 
independent,  and  the  cosine  of  the  angle  between  the  two  signals  (vectors)  is  zero  [19  . 
Finally,  if  the  two  signals  (vectors)  point  in  opposite  directions,  then  the  angle  be¬ 
tween  them  is  180°,  and  the  signals  (vectors)  are  completely  different  [19]  -  Returning 
to  Equation  (4.5)  and  noting  that  the  two  pdfs  never  assume  negative  quantities, 


6The  cosine  of  the  angle  between  two  vectors  a  and  b  in  any  dimensional  Euclidean  space  is  given 
by  (using  the  more  general  inner  product  notation)  [34] 


cos  9  = 


(a,b) 


\/ (a,  a)(b,  b) 
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the  range  of  possible  values  of  the  right-hand  side  of  this  equation  is  restricted  to  be 
between  zero  and  one.  Thus,  if  one  seeks  a  good  approximation  of  a  Gaussian  mixture 
pdf,  then  the  Correlation  Measure  between  the  original  and  approximation  mixtures 
would  have  to  be  close  to  one. 


Showing  that  Equation  (14.51)  is  a  true  distance  measure  requires  a  deviation 
from  the  previous  pattern  of  proofs.  The  Correlation  Measure  is  symmetric  because 
interchanging  the  arguments  of  this  distance  measure  does  not  affect  the  distance 
calculation,  and  it  is  non-negative  since  it  is  restricted  to  values  between  zero  and 
one,  as  noted  in  the  previous  paragraph.  In  addition,  this  measure  must  satisfy 
the  triangle  inequality  since  the  numerator  of  Equation  (14.51)  must  be  less  than  or 
equal  to  the  denominator  by  the  Schwartz  inequality  (with  equality),  and  the  triangle 


inequality  reduces  to  the  Schwartz  inequality  [34].  Thus  the  Correlation  Measure  has 
the  quality  of  being  a  true  distance. 


1.1. 5  Hellinger  Affinity  Measure.  The  Hellinger  Affinity  Measure 
[24]  (or  Bhattacharyya  coefficient)  is  the  second  true  distance  measure  which  may 
be  viewed  as  the  cosine  of  the  angle  between  the  square  root  of  the  original  and 
the  square  root  of  the  approximate  reduced-component  Gaussian  mixture  pdfs  using 
the  Hilbert  space  vector  analogy)].  This  measure  was  considered  for  application  to 


Gaussian  mixture  reduction  by  Lainiotis  and  Park  in  [18],  and  it  has  the  form 


TA{f(x\a„),f(x\n)}  =  (s/f(x\{i,),ff(x\n)) 


(4.6) 


00  / - 

\J  f{x\VLo)f{x\(l)dx. 

00 

Notice  that  Equation  (14.6)  is  simply  the  cosine  of  the  angle  between  the  the  element¬ 
wise  square  root  of  each  vector  representing  the  two  pdfs  in  Hilbert  space.  This  insight 

7In  this  case,  the  square  root  of  the  infinite-dimensional  vectors  representing  each  pdf  is  taken  as 
the  element-wise  square  root  since  the  square  root  of  a  vector  is  undefined. 
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is  apparent  when  one  considers  replacing  the  pdfs  in  the  Correlation  Measure  with 
the  square  root  of  each  pdf.  Then,  the  denominator  would  reduce  to  one  because  the 
integral  of  a  pdf  over  the  entire  sample  space  of  the  random  variable  it  describes  is 
one.  The  Hellinger  Affinity  Measure  satisfies  the  three  requirements  of  a  true  distance, 
which  can  be  shown  in  a  similar  manner  as  that  for  the  Correlation  Measure,  and  it 
suffers  from  the  same  difficulty  in  finding  an  exact  closed-form  solution  as  the  Hellinger 
Distance  (their  functional  forms  are  clearly  related,  as  seen  by  comparing  Equations 
(4.6)  and  (4.4)). 

4-1-2  Pseudo- Distance  Measures.  Pseudo- distance  measures  are  different 
from  true  distance  measures  because  they  do  not  satisfy  all  three  properties  of  a 
true  distance  listed  in  Section  Hdl  However,  this  distinction  should  not  preclude 
their  application  to  pdf  discrimination  problems.  In  fact,  the  two  Kullback-Leibler 
measures  are  probably  used  more  in  practice  than  all  of  the  true  distance  measures 
combined  [121  T5i  [24] .  The  following  pseudo-distance  measures  are  presented  in  this 
subsection: 


•  Kullback-Leibler  Mean  Information 

•  Kullback-Leibler  Divergence 

•  Salmond’s  Joining  Algorithm  cost  function. 

The  Kullback-Leibler  measures  were  originally  derived  for  use  in  sample  observation- 
based  problems  in  which  the  true  pdf  of  the  parent  distribution  (i.e. ,  the  distribution 
that  spawned  the  samples)  is  unknown,  but  the  measures  may  also  be  applied  when 
the  true  pdf  is  known  exactly  (as  is  the  case  for  this  thesis). 


4- 1.2.1  Kullback-Leibler  Mean  Information.  When  applied  to  a  sam¬ 
ple  observation-based  problem,  one  interpretation  of  the  Kullback-Leibler  Mean  In¬ 
formation  measure  is  that  it  provides  an  indication  of  the  amount  of  new  information 


gained  for  discrimination  between  two  pdfs  by  making  an  additional  observation  [17 
As  a  pseudo-distance  measure,  the  Kullback-Leibler  Mean  Information  measure  is 
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not  symmetric  and  it  does  not  satisfy  the  triangle  inequality  [17] .  However,  by  using 
Jensen’s  inequality  (see  Equation  (l3.9j))  one  can  show  that  this  distance  measure  is 
non- negative  [24] .  The  Kullback-Leibler  Mean  Information  is  (the  D  standing  for 
distance  to  distinguish  pseudo-distance  measures  from  true  distance  measures) 


DMi{f(x\no),f(x\n)}  = 


f(x |00)  In  ^X^dx. 

-°c  f(x\n) 


(4.7) 


The  natural  logarithm  in  Equation  (4.7)  makes  obtaining  an  exact  closed-form  solu¬ 
tion  of  this  distance  extremely  difficult  when  the  pdfs  are  Gaussian  mixtures  [38] . 

4- 1.2.2  Kullback-Leibler  Divergence.  The  Kullback-Leibler  Divergence 


measure  provides  a  sense  of  the  difficulty  in  discriminating  between  two  pdfs  pT7].  Like 
the  Mean  Information,  the  Divergence  is  non-negative,  but  unlike  this  measure,  the 
Divergence  is  symmetric  [171 24],  The  Kullback-Leibler  Divergence  is  given  by 


»oo 


DD{f(x\n0),f(x\Ci)}  = 


f(x\Q0)-f(x\Q) 


r-oo  L 


lnMJAdx.  (4,8) 
f(x  |n) 


Again,  the  presence  of  the  natural  logarithm  function  makes  finding  an  exact  closed- 


form  solution  extremely  difficult  when  the  pdfs  are  Gaussian  mixtures  [38  . 


4- 1.2.3  Salmond’s  Joining  Algorithm  Cost  Function.  Salmond  based 
the  cost  function  for  his  Joining  Algorithm  on  penalizing  changes  to  the  structure  of 
the  original  Gaussian  mixture  caused  by  reduction  actions  [30].  The  cost  function 
is  derived  by  first  setting  the  covariance  of  the  full- component  mixture  equal  to  the 
covariance  of  the  reduced-component  mixture,  thus  maintaining  the  “structure”  of 
the  original  mixture.  In  the  final  step  of  his  derivation,  Salmond  develops  a  scalar 
cost  function  based  on  the  Mahalanobis  distance  [301  [31] .  This  cost  function  meets 
the  symmetric  and  non-negative  properties  of  a  true  distance  measure,  and  so  it  is 
classified  as  a  pseudo-distance  measure. 
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Salmond’s  Joining  Algorithm8  cost  function  is  (breaking  from  the  previous  no¬ 
tation  to  maintain  consistency  with  Salmond’s  notation) 


4  = 

1  a  +  Pj 


Vjf P  V* 


Pi 


(4.9) 


where  d'k  is  the  squared  distance  resulting  from  merging  mixture  components  i  and 


j,  and  P  is  the  overall  covariance  of  the  mixture  [31].  The  distances  between  compo¬ 
nents  are  pair-wise  compared  using  this  cost  function,  and  the  components  that  fall 


below  some  distance  threshold  are  merged  [31].  Salmond’s  Joining  Algorithm  will  be 
discussed  further  in  Section  4.3. 


4-2  Greedy  Algorithms  for  the  Assignment  Problem 

Consider  the  problem  of  reducing  a  multivariate  Gaussian  mixture  pdf  with 
NH{k )  mixture  components  to  one  with  a  reduced  number  NR(k),  NH{k )  >  NR(k),  by 
either  merging  or  deleting  components.  A  pool  of  potential  reduction  actions  for  the 
original  mixture  pdf  is  formed  by  proposing  the  deletion  of  any  one  of  the  mixture  com¬ 
ponents,  1, . . . ,  Nf{(k),  or  the  merging  of  every  pair  of  distinct  mixture  components, 
where  the  merged  component’s  parameters  are  calculated  using  Equation  (12.26]).  After 
all  possible  actions  are  proposed,  this  pool  consists  of  NR(k)  and  Aby(/c)[Aby(/c)  — lj/29 10 
proposed  mixture  component  deletion  and  merge  actions,  respectively.  Selecting  the 
“best”  reduction  action  or  actions  from  the  set  of  NH{k )  +  {N H{k)[N H{k)  —  l]/2} 


possible  actions  may  be  viewed  as  an  assignment  problem,  and  eit 
algorithms,  called  Greedy  Algorithm  A  and  Greedy  Algorithm  B 


rer  of  two  heuristic 


10 


,  may  be  applied 


8A  description  of  Salmond’s  Joining  and  Clustering  algorithms  is  given  in  Section  4731 

9The  number  of  possible  merge  actions  is  equivalent  to  selecting  two  mixture  components  from 
NR(k)  mixture  components  without  replacement  and  without  regard  to  order.  Thus,  the  number  of 
possible  merge  actions  is  [20] 

(NH(k)\  _  NH(k)\  NH(k)[NH(k)  -  1  }[NH(k)  -  2]!  _  NH(k)[NH(k)  -  1] 

l  2  )  2\[NH(k)-2}\  2[NH(k)-2}\  2 


10 Williams’  ISE  cost-function-based  algorithm  makes  use  of  Greedy  Algorithm  B  [41]  ■ 
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to  this  problem  [29].  Both  of  these  algorithms  assign  the  “best”  reduction  action  or 
actions  based  on  the  output  of  a  measure  function. 

Again,  consider  the  problem  of  reducing  an  A#  (A;)-component  Gaussian  mixture 
pdf  to  an  Nr{ ( fc) -component  mixture  pdf.  According  to  Greedy  Algorithm  A,  one 
selects  the  [NH(k)  —  NR(k )]  reduction  actions  from  NH(k )  +  {N H(k)[N H{k)  —  l]/2} 
possible  actions  with  the  most  favorable  outputs  of  some  measure  function  [29].  In 
the  case  of  the  distance  and  pseudo-distance  measure  functions  of  Section  14. 1|  the 
[. NH{k )  —  NR(k)\  best  reduction  actions  are  those  with  the  smallest  corresponding 
distance  measures.  The  best  [NR{k)  —  NR(k )]  reduction  actions  are  then  executed, 
and  a  reduced-component  Gaussian  mixture  pdf  approximation  is  obtained. 

In  contrast,  Greedy  Algorithm  B  selects  the  NH(k)  —  NR(k)  best  actions  through 
an  iterative  process.  At  the  first  iteration,  a  set  of  {NH(k)  +  N H(k)[N H(k)  —  l]/2} 
possible  mixture  reduction  actions  is  proposed.  Then,  Greedy  Algorithm  B  selects 
the  best  reduction  action  from  the  set  based  on  some  measure  function,  and  executes 
this  reduction  action  by  either  deleting  the  selected  mixture  component  or  merging 
two  selected  mixture  components.  At  the  next  iteration,  a  new  set  of  possible  deletion 
and  merge  actions  is  proposed.  Since  the  number  of  mixture  components  was  reduced 
by  one  in  the  previous  iteration,  the  number  of  possible  mixture  reduction  actions 
is  now  [NH(k)  —  1]  +  {[NH(k)  —  l][NH(k)  —  2]/2}.  This  process  continues  until  the 
desired  number  of  reductions,  [NR(k)  —  NR(k)],  is  obtained. 

4-3  S almond’s  Joining  &  Clustering  Algorithms 

The  Joining  and  Clustering  algorithms  are  two  MRAs  for  a  Bayesian  tracking 
in  clutter  algorithm.  In  [301131],  Salmond  develops  and  tests  these  algorithms  for  a 
scenario  of  a  single  target  in  clutter.  The  algorithms  are  applied  separately  to  the 
task  of  reducing  the  number  of  mixture  components  of  the  a  posteriori  target  state 
pdf  while  attempting  to  maintain  track  on  the  target. 
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Before  developing  his  algorithms,  Salmond  created  design  objectives  which  are 


paraphrased  below  [31 


(i)  The  pdf  approximations  resulting  from  the  algorithms  should  be  the  same  form 
as  the  approximated  pdf;  i.e.,  the  approximation  should  be  a  multivariate  Gaus¬ 
sian  mixture. 


(ii)  The  algorithm  allows  the  user  to  specify  the  number  of  components  in  the 
approximation. 

(iii)  Reduction  actions  will  be  guided  by  a  predetermined  threshold  on  a  cost  func¬ 
tion  that  measures  the  change  to  the  “structure”  of  the  approximated  mixture. 
Reduction  should  continue  until  either  the  threshold  is  breached  or  the  num¬ 
ber  of  specified  components  is  reached.  This  cost  criterion  is  chosen  since  it  is 
computationally  tractable!2!. 


(iv)  Intuitively,  the  approximation  should  maintain  the  overall  mean  and  covariance 
of  the  original  mixture. 


(v)  The  algorithms  should  be  computationally  efficient  such  that  the  approximation 
can  take  place  before  the  completion  of  a  scan  cycler2. 


Pair-wise  component  merging,  carried  out  by  Equation  (12.261),  is  the  focus  of 


each  iteration  of  the  Joining  Algorithm  [3T] .  This  process  is  governed  by  requirement 
(ii)  from  above  so  that  merging  continues  until  cither  a  preset  threshold  is  breached  or 
the  requisite  number  of  reduced  components  is  met  [31].  The  cost  function  is  given  in 
Equation  (14.9)  and  the  threshold  T  is  set  by  noting  that  the  cost  function  is  bounded 
below  the  dimension,  n,  of  the  target  state  random  process  vector,  and  simulation 
results  indicate  that  T  =  O.OOln 


nThis  part  of  requirement  (iii)  appears  to  have  been  the  motivation  behind  Williams’  thesis  [38]. 
He  suggested  that  a  more  robust  cost  function  could  be  utilized  and  still  be  computationally  feasible 
given  the  improvement  in  computing  power  since  Salmond’s  dissertation. 

12The  completion  of  one  scan  cycle  means  that  the  propagation  and  measurement  update  stages 
(as  given  in  Subsection  12,2.1)  have  been  completed. 
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The  Clustering  Algorithm  [31]  iteratively  finds  the  component  with  the  largest 
mixture  weight  and  groups  other  components  which  are  closest  to  this  cluster  center 
(the  component  with  the  largest  mixture  weight  at  each  iteration).  Grouping  decisions 
are  based  on  a  cost  function  with  the  same  form  as  (4.9)  but  with  slightly  different 
components  pi]: 


Di  = 


PiPc 

Pi+Pc 


(/A 


»c)TPc 


[Mi  -  Me 


(4.10) 


(the  subscript  c  indicates  that  the  parameters  are  from  the  cluster  center  component). 
As  with  the  Joining  Algorithm,  a  threshold  is  used  to  determine  when  components 
should  be  clustered,  but  unlike  the  Joining  Algorithm,  this  threshold  is  linearly  in¬ 
creased  if  the  preset  number  of  reduced  components  is  not  achieved  at  the  end  of 


a  clustering  iteration  pi].  The  value  of  this  threshold  has  a  nice  geometric  inter¬ 
pretation  as  being  the  volume  of  the  hyperellipsoid  centered  about  the  mean  of  the 
cluster  center  and  encompassing  a  certain  percentage  of  the  components’  probability 
mass  pl|. 


4-4  Williams’  ISE  Cost- Function- Based  Algorithm 


In  p8],  Williams  develops  the  ISE  Initialization  and  ISE  Iterative  Optimization 
MRAs.  The  ISE  Initialization  MRA  uses  the  ISE  cost  function  to  choose  a  start¬ 
ing  point  for  the  approximate  reduced-component  mixture  parameters.  This  starting 
point  is  then  used  in  the  ISE  Iterative  Optimization  MRA  which  optimizes  the  values 
of  the  mixture  parameters  using  the  gradient  of  the  ISE  cost  function  with  respect 


to  each  of  the  three  multivariate  mixture  parameters.  However,  it  was  noted  in  [38 


that  the  value  of  the  mixture  parameters  obtained  using  the  Initialization  algorithm 
provided  a  starting  point  which  typically  was  close  to  the  Iteratively  Optimized  pa¬ 
rameter  values,  so  that  the  added  computational  load  of  the  optimization  algorithm 


could  be  avoided  by  using  the  Initialization  algorithm  only  p8  . 

The  ISE  Initialization  MRA  combines  the  benefits  of  the  pdf  measure  functions 
used  by  Alspach,  Lainiotis,  and  Park  [I|[2][[8]  and  adapts  the  work  of  Salmond 
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to  create  an  MRA  that  produces  a  track  life  that  surpasses  that  of  the  Joining  and 


Clustering  algorithms  [40]  when  considering  a  tracking  scenario  of  a  single  target  in 
heavy  clutter.  Conceptually,  the  ISE  cost  function  given  by  Equation  (14.21)  is  similar 
to  the  Kolmogorov  Variational  Distance  used  by  Alspach  and  the  Hellinger  Affinity 
Measure  used  by  Lainiotis  and  Park  in  that  it  takes  into  consideration  the  entire  pdf 
of  the  original  and  reduced- component  mixtures.  However,  unlike  the  Kolmogorov 
Variational  Distance  and  Hellinger  Affinity  Measure,  the  ISE  cost  function  does  not 
need  to  be  approximated  to  obtain  a  closed-form  solution  in  the  case  of  multivariate 
Gaussian  mixtures 
(listed  in  Section 


Williams  adopts  all  of  the  requirements  posed  in  [31 


),  but  requirement  (iii)  is  modified  to  remove  thresholding  as  a 
criterion  to  stop  reduction.  The  merging  Equations  (12.26)  developed  in  [31]  are  used 
to  combine  mixture  components  but  clustering  is  avoided  [38]. 


A  flowchart  summarizing  the  ISE  Initialization  MRA  (which  uses  Greedy  Algo¬ 
rithm  B)  is  shown  in  Figure  14.21  (reproduced  from  [38]).  Like  the  Joining  Algorithm, 
each  pair  of  components  is  potentially  merged  using  (12.261).  The  cost  of  these  merging 
actions  is  evaluated  using  the  ISE  cost  function  which  reduces  to  (keeping  most  of 
Williams’  original  notation) 


NH(k)  NR(k ) 

JHR  =  Y  Y  P,  +  Pj} 

i=  1  j= 1 
NR{k)  NR(k) 

Jrr  =  Y  Y  PiPj-^{£il£rPi  +  Pj} 

i= 1  3= 1 

NH(k)NH(k) 

jhh  =  Y  Y  /V  l>;  + 

i=  1  j— 1 


where  the  bars  over  the  parameters  indicate  those  of  the  reduced  mixture,  Nff{k) 
and  Nn(k)  are  the  number  of  original  and  reduced  mixture  components,  respectively, 
at  the  end  of  scan  k,  and  A/"{-}  is  a  multivariate  Gaussian  pdf  with  the  specified 
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Finished 


Figure  4.2:  A  flowchart  for  the  ISE  cost-function-based  Initialization  algorithm 

(which  uses  Greedy  Algorithm  B). 
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functional  arguments  in  bracket: 


13 


Unlike  the  Joining  Algorithm,  the  cost 
is  also  evaluated  for  deleting  each  component.  Given  all  of  the  potential  reduction 
actions,  the  one  with  the  lowest  cost  is  executed.  This  process  continues  until  the 
preset  number  of  reduced  mixture  components  is  met  (as  illustrated  in  Figure  I4f2|), 
in  contrast  to  Salmond’s  Joining  Algorithm  which  may  continue  merging  components 


below  this  number  as  long  as  the  threshold  is  not  exceeded  [31 


4 . 5  Summary 

In  Chapter  HI  the  Bayesian  solution  for  tracking  a  target  in  clutter  leads  to 
a  Gaussian  mixture  target  state  pdf  in  which  the  number  of  mixture  components 
(hypotheses)  grows  without  bound  over  time.  Implementing  this  solution  requires 
some  type  of  mixture  reduction  algorithm  (MRA)  to  limit  the  number  of  hypotheses 
to  a  manageable  number  while  maintaining  good  tracking  performance.  The  choice 
of  measure  function  used  as  a  criterion  to  make  mixture  reduction  decisions  has  a 
major  impact  on  tracking  performance  because  it  is  a  key  element  of  an  MRA.  True 
distance  and  pseudo-distance  measure  functions  were  defined,  and  a  comparison  of  the 
geometric  interpretation  of  each  distance  measure  was  made  (when  applicable),  along 
with  a  judgment  of  the  measure  function’s  suitability  for  practical  implementation. 
The  task  of  reducing  the  number  of  components  of  a  multivariate  Gaussian  mixture 
pdf  by  utilizing  a  measure  function  as  the  reduction  decision  criterion  is  an  assignment 
problem.  Two  heuristic  assignment  algorithms,  Greedy  Algorithm  A  and  Greedy 
Algorithm  B,  were  presented  as  candidate  solutions  to  this  problem.  Two  existing 
MRAs,  Salmond’s  Joining  and  Clustering  algorithms  and  Williams’  ISE  cost-function- 
based  algorithm,  were  summarized  because  they  form  the  basis  for  this  thesis. 


13This  notation  may  be  a  little  confusing  since  the  mean,  which  is  a  deterministic  vector,  is  used 
as  the  random  vector  of  the  pdf.  The  A/"{  • }  terms  should  be  interpreted  as  a  functional  form  and 
not  an  actual  pdf. 
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V.  Gaussian  Mixture  Reduction  Algorithm  Development  &; 

Analysis 


Several  mixture  reduction  algorithms  (MRAs)  are  developed  in  this  chapter 1 .  The 
MRAs  are  composed  of  two  essential  pieces:  the  measure  function  used  as  a  de¬ 
cision  criterion  for  selecting  mixture  reduction  actions,  and  the  assignment  algorithm 
which  uses  the  measure  function  to  decide  which  actions  to  execute.  The  most  suit¬ 
able  measure  functions  from  Chapter  III  and  Chapter  [TV]  are  mated  with  one  of  the 
assignment  algorithms  of  Chapter  [TV]  to  produce  each  new  MRA.  Various  univariate 
Gaussian  mixture  pdfs  are  used  to  test  the  new  MRAs,  and  the  results  are  analyzed 
to  identify  the  best  candidates  for  implementation  in  a  full-scale  Bayesian  tracking 
algorithm  for  use  in  the  presence  of  measurement  origin  uncertainty. 


5.1  Measure  Function  &  Assignment  Algorithm  Selection 

Before  developing  an  MRA,  a  suitable  measure  function  must  be  chosen  and 
coupled  with  either  Greedy  Algorithm  A  or  Greedy  Algorithm  B  from  Section  fC2t  A 
measure  function  is  considered  suitable  if  it  can  be  exactly  evaluated  or  approximately 
evaluated  in  closed-form  when  the  pdfs  are  multivariate  Gaussian  mixtures,  and  if 
the  interpretation  of  its  results  is  unambiguous.  Either  Greedy  assignment  algorithm 
may  be  used  with  any  suitable  measure  function.  Salmond’s  Joining  Algorithm  cost 
function  is  not  considered  because  it  does  not  incorporate  the  reduction  action  effects 
on  the  entire  target  state  pdf.  However,  Williams  thoroughly  compared  his  Integral 
Square  Error  cost  function  with  Salmond’s  Joining  Algorithm  cost  function  in  (381 B01 


Excluding  Salmond’s  Joining  Algorithm  cost  function,  the  nine  measure  func¬ 
tions  presented  in  Chapters  HU  and  [[0  will  be  considered  for  implementation  in  new 
MRAs: 

1.  MLE  measure  (Equation  (13.26(1) 


lrThe  sample-based  multivariate  Gaussian  mixture  approximation  method  using  the  EM  algorithm 
suggested  in  Chapter  JR  is  not  considered  in  this  chapter. 
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2.  Kolmogorov  Variational  Distance  (Equation  (j4.1[)) 

3.  Integral  Square  Error  (ISE)  cost  function  (Equation  (14.2)) 

4.  Expected  Likelihood  Kernel  (Equation  (4.31)) 

5.  Hcllinger  Distance  (Equation  (14.411) 

6.  Correlation  Measure  (Equation  (4.5)) 

7.  Hcllinger  Affinity  Measure  (Equation  (4.6)) 

8.  Kullback-Leibler  Mean  Information  (Equation  (14.7)) 

9.  Kullback-Leibler  Divergence  (Equation  (14.8)1). 

A  number  of  these  measure  functions  are  unsuitable.  Exact  or  approximate 
closed-form  evaluation  of  measure  functions  2,  8,  and  9  is  extremely  difficult,  if  not 
impossible,  to  obtain  when  the  arguments  of  these  functions  are  Gaussian  mixture 
pdfs,  as  noted  in  the  literature  PS,  [38],  so  these  measure  functions  are  not  suitable. 
The  MLE  measure  function  is  discarded  for  the  same  reason.  As  pointed  out  in 
Subsection  4.1.1 .2.  the  interpretation  of  results  generated  by  the  Expected  Likelihood 
Kernel  is  somewhat  ambiguous,  so  this  measure  function  is  also  deemed  unsuitable. 

To  see  this  point,  consider  a  case  in  which  the  Expected  Likelihood  Kernel  of 
two  pdfs  is  small.  This  result  would  imply  that  the  two  pdfs  are  dissimilar  based  on 
the  concept  of  orthogonality.  However,  a  large  measure  value  does  not  necessarily 
correspond  to  a  high-degree  of  similarity  between  the  two  pdfs  under  consideration. 
Figure  [full  shows  two  five-component  univariate  Gaussian  mixture  pdfs  which  differ 
only  by  their  respective  mixture  weights.  The  parameters  for  the  first  mixture  pdf 
(the  solid  trace  in  the  figure)  and  the  second  mixture  pdf  (the  dash-dotted  trace  in 
the  figure)  are  shown  in  Table  157X1  The  Expected  Likelihood  Kernel  of  the  solid-trace 
mixture  pdf  with  itself  is  0.075159,  while  the  same  measure  between  the  solid-trace 
and  dash-dotted  trace  pdfs  is  0.075484,  which  is  larger.  Thus,  in  general,  a  larger 
Expected  Likelihood  Kernel  value  does  not  always  imply  a  better  match  between 
pdfs. 
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Two  Univariate  Gaussian  Mixture  pdfs 


Figure  5.1:  Two  distinct  five-component  univariate  Gaussian  mixture  pdfs  with 

the  same  mean  and  variance  parameters,  but  with  different  mixture  weight  parame¬ 
ters.  The  Expected  Likelihood  Kernel  measure  of  the  solid-trace  mixture  with  itself 
is  0.075159,  while  the  same  measure  between  the  solid-trace  and  dash-dotted  trace 
mixture  pdfs  is  0.075484. 


Parameter 

First  pdf 

Second  pdf 

Weight 

[0.3,  0.1,  0.1,  0.2,  0.3]t 

[0.3,  0.2,  0.1,  0.15,  0.25]t 

Mean 

[-1,2,6,10,5]^ 

[— 1,  2,  6, 10,  5]t 

Variance 

[1,4,1,3,3]t 

[1,4,1, 3,  3]t 

Table  5.1:  Mixture  parameters  for  the  two  univariate  Gaussian  mixture  pdfs  in 

Figure  5.1. 
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Measure  functions  3,  5,  6,  and  7  are  considered  suitable  for  implementation  in 
an  MRA.  Small  evaluated  values  from  the  ISE  cost  function  and  Hellinger  Distance 
indicate  that  two  pdfs  under  consideration  are  well-matched,  while  values  close  to 
one  imply  that  two  pdfs  are  similar  when  using  the  Correlation  Measure  or  Hellinger 
Affinity  Measure2].  An  exact  closed-form  solution  for  the  ISE  cost  function  for  mul¬ 
tivariate  Gaussian  mixture  pdfs  was  derived  in  [3811401 [41],  and  an  exact  closed-form 
solution  for  the  Correlation  Measure  also  exists  since  the  basic  terms  used  in  this 
measure  function  are  also  used  in  the  ISE  cost  function.  Although  exact  closed-form 
evaluations  of  the  Hellinger  Distance  and  Hellinger  Affinity  Measure  may  not  exist 
for  multivariate  Gaussian  mixture  pdfs  [151  [38],  approximate  closed-form  represen¬ 
tations  of  these  measure  functions  may  be  found  using  a  truncated  binomial  series 


approximation  or  the  “heuristic  approximation”  cited  in  [15  . 

Of  all  the  possible  pairings  of  measure  functions  with  assignment  algorithms, 
the  following  set  of  new  MRAs  will  be  investigated: 


1.  ISE  cost  function  mated  with  Greedy  Algorithm  A 

2.  Correlation  Measure  mated  with  Greedy  Algorithm  B 

3.  Hellinger  Distance  mated  with  Greedy  Algorithm  B 

4.  Hellinger  Affinity  Measure  mated  with  Greedy  Algorithm  B. 

The  first  MRA  is  a  modification  to  Williams’  ISE  cost-function-based  MRA  since  it 
replaces  Greedy  Algorithm  B  with  Greedy  Algorithm  A,  and  it  is  termed  the  ISE 
Shotgun  MRA.  The  second  new  MRA  is  created  by  replacing  the  ISE  cost  function 
with  the  Correlation  Measure  and  modifying  the  decision  logic  to  select  the  measure 
closest  to  one  as  opposed  to  the  one  closest  to  zero.  This  new  MRA  is  the  Correlation 
Measure  (CM)  MRA.  The  final  two  new  MRAs,  the  Hellinger  Distance  (HD)  and 
Hellinger  Affinity  Measure  (HA)  MRAs,  are  implemented  using  either  the  truncated 

2In  Subsection  4.1.1;  the  Correlation  Measure  and  Hellinger  Affinity  Measure  were  described  as 
the  cosine  of  the  angle  between  two  vectors  in  Hilbert  space.  A  measure  value  close  to  one  indicates 
that  the  angle  between  the  vectors  representing  the  pdfs  (using  the  Hilbert  space  vector  analogy)  is 
nearly  zero,  so  the  two  pdfs  are  similar. 
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binomial  series  approximation  or  the  “heuristic  approximation”  in  the  next  section. 
The  performance  of  each  new  MRA  will  be  compared  to  Williams’  ISE  MRA,  and  a 
final  comparison  of  the  best-performing  new  MRA  will  be  made  with  the  ISE  MRA. 


5.2  Mixture  Reduction  Algorithm  Development 

This  section  presents  the  complete  development  of  the  four  new  MRAs  listed  at 
the  end  of  the  previous  section.  Exact  closed-form  solutions  for  the  ISE  cost  function 
and  Correlation  Measure  in  the  case  of  multivariate  Gaussian  mixture  pdfs  are  shown, 
and  approximate  closed-form  results  for  the  Hellinger  Distance  and  Hellinger  Affinity 
Measure  are  derived  using  two  different  approximations.  The  process  of  proposing 
deletion  and  merge  reduction  actions  is  introduced.  Finally,  the  selected  measure 
functions  are  mated  to  their  corresponding  assignment  algorithm  and  detailed  de¬ 
scriptions  of  the  four  new  MRAs  are  given. 


5.2.1  Closed-Form,  Solutions  of  Select  Measure  Functions.  Exact  closed- 
form  solutions  for  the  ISE  cost  function  and  Correlation  Measure,  and  approximate 
closed-form  solutions  for  the  Hellinger  Distance  and  Hellinger  Affinity  Measure  are 


derived  in  this  subsection.  In  [38],  Williams  demonstrated  that  the  product  of  two 
multivariate  Gaussian  pdfs  results  in  another  multivariate  Gaussian  pdf,  and  this 


result  is  used  to  evaluate  his  ISE  cost  function.  In  [15],  the  authors  present  the 
general  probability  product  kernel  for  Gaussian  pdfs  which  provides  a  generalized 
version  of  the  result  found  in  [38] .  These  two  results  are  used  to  develop  the  “heuristic 


approximation”  suggested  in  [15]  for  the  Hellinger  Distance  and  Hellinger  Affinity 
Measure.  A  binomial  series  representation  for  the  square  root  function  in  the  two 
Hellinger  measures  is  used  to  obtain  the  other  approximate  closed-form  solution  for 
these  measures. 
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5. 2. 1.1  ISE  Cost  Function  &  Correlation  Measure  Closed-Form  Solu¬ 


tions.  Since  the  ISE  cost  function, 


TISE{f(x\n0),f(x\n)}  =  (f(x\n0)  -  f(x\fi), /(* |o0)  -  f(x\n)) 


f{x\n0)-f{x\n) 


' -oc  L 


dx 


*00  r 


f2(x\n0)  +  f2{x\n)  -  2f(x\n0)f(x\(i) 


dx 


=  (f(x\n0),  f(x  |n0)>  +  (f(x  |o),  f{x  |n))  -  2(f(x\n0),  f{x  |n)), 


and  the  Correlation  Measure, 


Tc{f(x\n0)j(x\n)}  = 


if(x\n0)j(x\n)) 


(f(x\n0),f(x\n0))(f(x\n),f(x\n)) 


r*  OC 


f(x\no)f(x\Q)dx 


’  -oo 


*oo  /•  oc 

f2(x\Q0)dx  /  f2(x\Cl)dx 


share  the  two  self-likeness  terms  and  cross-likeness  term,  the  required  evaluations 
of  the  exact  closed-form  solution  when  the  pdfs  are  multivariate  Gaussian  mixtures 
are  the  same.  Although  Williams’  closed-form  result  may  be  applied  directly  to  both 
measures,  the  general  probability  product  kernel  for  multivariate  Gaussian  pdfs  given 
in  [15]  is  used  instead  for  two  reasons.  First,  the  general  probability  product  kernel  is 


needed  to  derive  the  “heuristic  approximation”  for  the  Hellinger  measures  in  the  next 
subsection.  Second,  the  result  produced  by  the  general  probability  product  kernel 
may  be  checked  against  Williams’  solution  to  validate  it  (at  least  in  one  case). 
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The  general  probability  product  kernel  for  two  multivariate  Gaussian  pdfs  and 
general  p  is  given  as  (note  that  |  •  |=  det(-)) 


KP{f(x |Mo,  P o),  f(x\ £,  P)}  =  {fp{x\fi0,  P0),  fp(x\fi,  P)>  (5.1) 

=  (27r)^1~2^n//2(p)~n^2  |  pi  I1/2)  P„  |" ~p/2\  p  | ~p/2 
■  exp  {—  +  £tP“1£  -  P|TPV] } 


P'  =  (Po-'  +  P-1)" 
V  =  P0- Vo  +  P  ^ 


(Note  that  pt  is  an  inverse  covariance  and  not  a  covariance.)  Recall  that  a  multi¬ 
variate  Gaussian  mixture  pdf  has  the  form 

M 

/(*  1°)  =  5^Pi/(*Vi,Pi)  (2T23J) 

2=1 

so  that  the  original,  full-component  single-target  state  multivariate  Gaussian  mixture 
pdf  may  be  written  as 

NH(k) 

f(x(k)\no(k))  =  ^2  p0iif(x(k) |/x0ii,P0,i)  (5.2) 

2=1 


and  the  approximated,  reduced-component  single-target  state  multivariate  Gaussian 
mixture  pdf  is 

NR(k) 

f(x(k)\Q(k))  =  Pi/(*WlAj,Pj)-  (5-3) 

3= 1 

Notice  that  the  time  dependence  on  k  was  added  since  the  single-target  state  is 
modeled  as  a  random  process  vector.  At  sample  k,  the  random  process  vector  x(k) 
is  simply  a  multivariate  Gaussian  mixture  vector  with  parameters  given  in  ft a(k) 
or  f l(k).  In  contrast,  the  time  dependence  is  not  explicitly  shown  in  the  mixture 
component  parameters  pDii,  p,oi,  P a,i,  Pj,  fij,  and  P?  to  enhance  readability,  but 
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it  should  be  understood  that  these  values  generally  change  from  sample  to  sample. 
Substituting  Equations  115.2])  and  115.311  into  Equation  lj5.ll).  and  setting  p  —  l  yields 
the  closed-form  solution  for  the  cross-likeness  term 


(f(x(k)\no(k)),f(x(k)\n(k)))  =  (5.4) 

NH(k)NR(k) 

E  E*  ,iPjKi{f(x(k)\fi0ii,  P0li),  /(*(&)  l£j»  Pj)}- 

i=l  j— 1 

Likewise,  the  full-component  mixture  self-likeness  term  is 

(f(x(k)\n0(k)),x(k)\Q0(k)))  =  (5.5) 

Nh  (k)  NH(k) 

E  E  Po,iPo,jKi{f{x(k)\fi0ti,P0}i)J{x(k)\voJ,P0,j)} 

i=  1  j= 1 

and  the  reduced-component  mixture  self-likeness  term  is 

(f(x(k)\£l(k)),  f(x(k)\tl(k)))  =  (5.6) 

NR(k)  NR(k) 

E  E 

i=  l  j= 1 
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Finally,  for  p  —  1,  Equation  (5.1)  evaluates  tol3 


Ki{f(x\n0,p0)j(x\fi,p)}  =  (2vr)-"/2  i  p;1 +P-1  ri/2i  p0  ri/2i  p  ri/2 

•  exp  |  -  ^  Mo  P0”Vo  +  V  P  *M 

-  (P-Vo  +  P-1A)T(P0"1  +  p-'rW/ic  +  p_1m) 

+ mtp  *m  -  mT(p0 + p)ipp0i 


=  |  2vr(P0  +  P)  |  1/2  exp  <j  -  ^ 


-  mtp  1P0(P0  +  p^pp1"  -  rJi*-1 


Mo  -  mTP_1Po(Po  +  P)-1^  -  M^(Po  +  P^1 


o  Mo 

All 


=  |2vr(P0  +  P)  |~1/2 

'.T/fj-1 


exp 


Mo  (Po  1  -  P;lp(Po  +  P)_1)Mo 


+  mT(P^1  -  P_1Po(Po  +  P)"1)^  -  2m 


-olPo  +  P)  XM 


=  |  2vr(P0  +  P)  |~1/2  exp 


-  mt(P0  +  P)-1Mo  +  MT(Po  +  P'"1 


mI(Po  +  p 

M 


)  Vo-M^Po  +  P)  XM 


=  |27r(Po  +  P)|  1/2  exp  |  -  i(M0-M)T(P0  +  P)  fi)\ 


(5.7) 


which  matches  the  result  found  in  [38].  Thus,  the  ISE  cost  function  and  Correlation 
Measure  are  fully  specified  by  Equations  (15.41) .  (15 .5),  and  (]5.6j).  and  the  probability 


3This  result  is  derived  by  using  the  following  linear  algebra  properties/relations  [81(211(34]: 

(a)  |  A  |-!=  ^  =|  A-1  | 

(b)  |  AB  |  =  |  A  ||  B  | 

(c)  |  cA  |  =  cn  |  A  |,  where  c  is  a  scalar  and  n  is  the  number  of  rows  or  columns  of  A 

(d)  If  A  and  B  are  symmetric  positive  definite,  then  A1  =  A,  AB  =  (BA)T,  (A~1)T  =  A^1, 
and  x 1  A ~1y  =  y 1  A~lx 

(e)  If  these  inverses  exist,  and  A  and  B  are  symmetric,  then  (A-1  +  B_1)_1  =  A  — A(A  +  B)_1A 
=  B  B(A  +  B)-1B=  A(A  +  B)_1B  =  B(A  +  B)-1A 
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product  kernel  for  p  =  1, 


fr1{/(*|/z0,po),/(®|A,P)}  = 

|  27t(P0  +  P) 


1-1/2 


exp 


;(Mo- A)T(PC 


i-i 


Oo- A)  >■ 


5.2. 1.2  Hellinger  Distance  &  Hellinger  Affinity  Measure  Closed-Form, 
Solutions.  Two  closed-form  approximations  of  the  Hellinger  Distance  and  Hellinger 
Affinity  Measure  are  derived  in  this  subsection.  The  first  closed-form  approximation 
uses  a  binomial  series  expansion!4;  for  the  \J f(x\fl0)f(x\ft)  term  in  Equations  (14.411 
and  (14.611.  and  extracts  the  first-order  term  from  this  expansion  as  the  approximation. 


The  second  approximation  is  the  “heuristic  approximation”  suggested  in  [15]  which 
replaces  a  square  root  of  a  sum  of  terms  with  the  sum  of  the  square  root  of  each  term 
(notionally  this  approximation  is  -*/a  +  b  +  c  ~  y/a  +  \[b  +  y/c). 


To  use  the  binomial  series  to  expand  y  f(x\Cl0)f(x\Cl),  first  note  that  this  term 
may  be  equivalently  written  as  [1  +  (f  (x\fl0)  f  (x\Ct)  —  l)]1/2.  Then,  the  binomial  series 
for  this  expression  is 


i  +  (f(x\n0)f(x\Ci)  - 1) 


1/2 


u= 0 


U\ 


(5.8) 


4The  binomial  series  is  given  by  [3] 


a + »r  =  E  ifiniuynjiil)  „« 


u— 0 


for  v  £  R  and  y  £  (—1,1]. 
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The  leading  term  is  equivalently  represented  as 


1)  2  •  4  •  •  •  (2m  —  2)  ■  2u 
1)  ’  2 -4- ■ -(2m -2) -2m 

— '  ' - V - " 

u  terms  u  terms 

(_!)«-!  (2m)!  1 

2“m!  2“m!  (2 u  —  1) 

(— 1)u_1(2m)! 

~~  22“(m!)2(2m  —  1)  ’ 

Substituting  this  expression  into  Equation  (1 5 .8)  and  writing  out  the  first  two  terms 
in  the  series  yields 


-1) 


u—  1 


U\ 


2  Uu\ 

2“m! 


1  •  1  •  3  •  •  •  (2m  -  3) 


(it— 1)  terms 


1  •  3  •  ■  ■  (2m  -  3)  • 


(2m 


(2m- 


i  +  (f(x\n0)f(x  |o)-i) 


nl/2 


“1  +  2 


f(x\n0)f(x  |o) 


+  ... 


-/(*|n0)/(*|n). 


(5.9) 


This  a 


terms 


uproximation  uses  the  first-order  term  of  the  expansion  and  discards  all  other 
.  Finally,  the  approximate  Hcllinger  Distance  becomes 


rH{/(a;|n„),/(x|n)}  =  \JfWn),V7WK>- 


=  4  / 1  -  [  \J f(x\n0)f(x\£i)dx 

J -oo 


i-lf  f(x\n0)f(x\n)dx 


(5.10) 


5  The  zeroth-order  term  is  discarded  since  keeping  this  term  would  result  in  an  indeterminant 
form  when  evaluating  /___  (l/2)dai. 


138 


and  the  Hellinger  Affinity  Measure  is  approximated  as 


TA{f(x\n0)j(x\n)} 


(V/M n„),  yJj(x\Sl)) 


L 


oo 


f(x\no)f(x\(l)dx 


-oo 


1  /‘°° 


L 


f(x\no)f(x\Cl)dx. 


(5.11) 


Notice  that  the  integral  term  in  both  Equations  (15.10)  and  (15.11)  is  the  cross-likeness 
term  of  the  ISE  cost  function,  and  it  may  be  evaluated  using  Equations  (15.4)  and 
(15.7).  Also,  this  approximation  functionally  represents  a  scaled  version  of  the  Ex¬ 
pected  Likelihood  Kernel  (Equation  (14.3)).  but  conceptually  it  was  derived  from  an 
approximation  of  the  Hellinger  Affinity  Measure,  which  is  a  true  distance  measure. 

The  “heuristic  approximation”  is  derived  by  approximating  the  integrand  in 
Equations  (14.411  and  (14 .6)  as  a  sum  of  square  roots  and  using  the  general  proba¬ 
bility  product  kernel  in  Equation  (15.1)  with  p  =  1/2.  Begin  by  writing  the  term 


OO  NH(k)  NR(k) 


NH(k)  NR(k) 


L 


£  £  VPo,iPj  /  V  f(X(k)\Vo,vPo,i)f(x(k)\p,j,Pj)dx(k) 

i  i  7-00 


-OO 


NH(k)  NR(k) 


£  £  V7 Po,iPjK1/2{f(x(k)\n0ti,P0>i),f(x(k)\plj,Pj )}. 


*= 1  3= 1 


(5.12) 
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Next,  evaluate  Equation  (5.1)  with  p  set  to  one-half: 


^i/2{/(®|At0,P0),/(*|A,P)}  =  (27r)° 

•  exp 


— n/2 


p-1  +  p  1  ri/2i  pD  ri/4i  p  ri/4 


1  r 


MrPoVo+ATp  iA-^tT pt  v 


-1/2 


P„P  I V4 


exp 


A)T(Po  +  P)  4(Mo-  A)  !>  •  (5.13) 


Finally,  the  “heuristic  approximation”  to  the  Hellinger  Distance  and  Hellinger  Affinity 
Measure  is  found  by  substituting  Equation  (15.13)  into  Equation  (15.12),  and  then 
substituting  this  result  into  the  appropriate  measure,  Equation  (14.4)  or  Equation 

(4ffi). 


5.2.2  Proposing  Mixture  Reduction  Actions.  Now  that  exact  closed- form  so¬ 
lutions  or  approximate  closed-form  solutions  to  the  selected  measure  functions  in  the 
case  of  multivariate  Gaussian  mixture  pdfs  have  been  found,  the  process  of  propos¬ 
ing  the  reduced-component  approximate  target  state  pdf,  f  (x(k)\n(k)) ,  is  outlined. 
Reduced-component  mixture  pdfs  are  proposed  by  either  deleting  a  single  component 
or  merging  two  distinct  components  of  the  original  full-component  Gaussian  mix¬ 
ture  pdf,  f(x(k)\fl0(k)),  or  of  the  approximated  target  state  pdf  from  the  previous 
mixture  reduction  algorithm  iteration,  depending  on  which  Greedy  assignment  algo¬ 
rithm  is  used.  At  the  end  of  each  scan  k ,  there  are  NH{k )  +  {N H{k)[N H(k)  —  l]/2} 
possible  reduced-component  mixture  pdfs  to  propose  (see  Section  14.2).  If  a  single 
mixture  component  is  deleted,  then  the  proposed  reduced-component  pdf  is  the  orig¬ 
inal  target  state  Gaussian  mixture  pdf,  but  with  one  mixture  component  removed. 
The  mixture  weights  of  the  reduced-component  mixture  pdf  are  not  re-normalized 
until  the  MRA  has  reached  the  requisite  number  of  components  to  improve  the  com¬ 
putational  speed  of  the  algorithm,  which  will  be  discussed  in  the  next  subsection. 
However,  this  improvement  in  run  time  alters  the  output  of  measure  functions  which 
calculate  the  length  of  the  error  vector  between  the  original  and  reduced-component 
mixture  pdfs  (using  the  Hilbert  space  vector  analogy  of  the  previous  chapter),  such  as 
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the  ISE  cost  function  and  the  Hellinger  Distance.  This  issue  is  covered  in  Subsection 
5.3.11  If  two  mixture  components  are  merged,  then  the  resulting  approximate  target 
state  Gaussian  mixture  pdf  is  the  original  pdf,  but  with  the  two  merged  components 
replaced  by  a  Gaussian  mixture  component  with  parameters  specified  by  Equation 
(I2.26J).  Re-normalization  is  not  an  issue  since  the  equation  used  to  calculate  the 
merged-component  mixture  weight  is  simply  the  sum  of  the  two  merged  components’ 
mixture  weights.  So,  as  long  as  the  sum  of  the  original  mixture  weights  equals  one, 
the  sum  of  the  reduced-component  mixture  weights  is  also  one. 

5.2.3  Mating  Measure  Functions  with  Assignment  Algorithms.  Mating  a 
measure  function  with  an  assignment  algorithm  produces  an  MRA.  In  this  subsection, 
the  four  new  measure  function/ assignment  algorithm  pairings  listed  at  the  end  of 
Section  15.11  are  expounded.  Insights  gleaned  from  [38]  and  Williams’  original  ISE 
cost-function-based  MRA  code  are  used  in  the  development  of  the  four  new  MRAs. 

5.2.3. 1  Correlation  Measure  MRA.  The  CM  MRA  is  simply  Williams’ 
original  MRA,  but  with  the  ISE  cost  function  replaced  by  the  Correlation  Measure 
and  the  decision  logic  modified  to  accept  measure  function  values  closest  to  one  in¬ 
stead  of  those  closest  to  zero.  A  flowchart  of  the  algorithm  is  shown  in  Figure  15721  The 
algorithm  begins  by  computing  and  storing  each  Correlation  Measure  term  in  Equa¬ 
tions  (15.41),  (l5.5j).  and  (15.61).  Parameters  for  all  possible  pairwise  component  mergings 
are  calculated  since  they  are  needed  when  proposing  reduced-component  approximate 
Gaussian  mixture  pdfs  when  the  reduction  action  is  to  merge  components.  Reduced- 
component  mixture  approximations  to  the  original  mixture  are  proposed  by  both 
deletion  and  merge  reduction  actions,  and  the  reduced-component  mixture  with  a 
Correlation  Measure  closest  to  one  is  declared  as  the  optimal  approximation  to  the 
original  Gaussian  mixture  pdf  (optimal  in  the  sense  of  the  measure  used).  This  process 
continues  through  iterations  of  setting  the  optimal  reduced-component  approximate 
mixture  pdf  from  the  previous  iteration  as  the  “original”  mixture  pdf  for  the  cur¬ 
rent  iteration,  until  the  desired  number  of  mixture  components  is  obtained.  At  the 
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Figure  5.2:  A  flowchart  of  the  Correlation  Measure  MRA. 
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final  step  of  the  algorithm,  the  mixture  weights  of  the  optimal  approximation  to  the 
original,  full- component  target  state  Gaussian  mixture  pdf  are  re-normalized. 

If  the  proposed  reduced-component  mixture  is  obtained  by  deleting  a  mixture 
component,  then  the  corresponding  Correlation  Measure  is  found  by  subtracting  every 
measure  term  in  Equations  (]5.4j) .  (I5.5t),  and  (15.61)  corresponding  to  the  deleted  com¬ 
ponent.  This  measure  calculation  is  accomplished  more  quickly  if  re-normalization 
of  the  remaining  components’  mixture  weights  is  not  considered  since  the  Correla¬ 
tion  Measure  terms  initially  computed  and  stored  at  the  start  of  the  algorithm  may 
be  used.  Omitting  re-normalization  of  the  mixture  weights  during  operation  of  the 
algorithm  does  not  degrade  the  approximation  fidelity  of  MRAs  based  on  measure 
functions  of  the  cosine  of  the  angle  between  two  vectors  representing  the  original  and 
reduced-component  mixtures,  since  re-normalization  only  affects  the  length  of  the 
vectors  and  not  their  directions.  This  concept  is  explained  in  detail  in  Subsection 

EM 

If  the  proposed  reduced-component  mixture  is  obtained  by  merging  two  mixture 
components,  then  the  corresponding  Correlation  Measure  between  the  full-component 
(or  the  resulting  approximate  mixture  pdf  from  a  previous  iteration  of  the  algorithm) 
and  reduced-component  pdfs  is  found  by  a  slightly  different  procedure  than  for  the 
case  of  deleting  a  component.  First,  the  Correlation  Measure  terms  containing  the  two 
merged  mixture  components  are  subtracted  from  the  sum  of  the  Correlation  Measure 
terms  initially  calculated.  Then,  the  Correlation  Measure  between  the  newly-formed 
merged  component  and  every  other  surviving  component  is  calculated  according  to 
Equations  (]5.4t),  (5.5).  and  (15.61).  These  terms  are  added  together  and  then  to  the 
Correlation  Measure  found  in  the  first  step  to  obtain  the  Correlation  Measure  of  the 
reduced-component  Gaussian  mixture  pdf  resulting  from  merging  the  two  selected 
mixture  components.  Again,  computational  efficiency  is  gained  by  re-using  previously 
stored  measure  terms  as  in  the  deletion  reduction  action  case. 
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5. 2. 3. 2  ISE  Shotgun  MR  A.  When  the  ISE  cost  function  is  combined 
with  Greedy  Algorithm  A  from  Section  14.21  the  ISE  Shotgun  MRA  is  created.  This 
new  MRA  sacrifices  the  quality  of  the  reduced-component  approximation  provided  by 
Williams’  original  ISE  cost-function-based  MRA  (this  MRA  used  Greedy  Algorithm 
B),  but  improves  the  computational  speed  of  the  algorithm.  The  computational  ac¬ 
celeration  may  be  seen  in  the  ISE  Shotgun  MRA  flowchart  of  Figure  5.31  Notice 
that  the  algorithm  continues  to  execute  the  lowest-cost  reduction  actions  based  on 
the  initially  calculated  costs  until  none  remain  or  the  requisite  number  of  reduced 
components  is  met  (as  indicated  by  the  first  decision  block  in  Figure  [5f3]l.  In  con¬ 
trast,  Williams’  original  ISE  cost-function-based  MRA  recomputes  new  costs  after 
each  reduction  action  is  executed6. 


5. 2. 3. 3  Hellinger  Distance  MRA.  As  shown  in  Figure  I5.4[  the  HD 
MRA  is  the  same  as  the  CM  MRA  except  that  the  appropriate  approximation  to  the 
Hellinger  Distance  replaces  the  Correlation  Measure  as  the  measure  function,  and  the 
reduced-component  mixture  pdf  with  the  smallest  distance  measure  with  respect  to 
the  original  mixture  pdf  is  the  optimal  approximation.  However,  neglecting  mixture 
re-normalization  causes  a  problem  with  the  Hellinger  Distance  given  by  Equation 
(|4.4|) .  The  problem  lies  in  the  simplification  made  in  the  final  line  of  the  Hellinger 

6In  [41],  Williams  identifies  two  efficiency  enhancements  beyond  those  incorporated  into  his  orig¬ 
inal  algorithm  which  modify  the  operation  of  his  MRA  by  negating  the  need  to  recompute  all  of  the 
new  costs. 
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Figure  5.3:  A  flowchart  of  the  ISE  Shotgun  MRA. 
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Figure  5.4:  A  flowchart  of  the  Hcllinger  Distance  MRA. 
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Distance  equation 
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The  first  integral  term  is  one-half  since  /(ec|f20)  is  a  valid  Gaussian  mixture  pdf  with 
the  sum  of  its  mixture  weights  constrained  to  one.  However,  the  second  integral  term 
is  not  guaranteed  to  be  one  during  operation  of  the  HD  MRA  since  the  reduced- 
component  mixture  weights  are  not  re-normalized  until  the  last  step  of  the  algorithm. 
Thus,  the  Hcllinger  Distance  given  by  Equation  (14.41)  must  be  modified  to  include 
the  possibility  of  reduced-component  mixture  weights  which  are  not  normalized.  This 
modification  is  accomplished  by  using  Equation  (12.2311  (with  appropriate  modification 
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to  the  mixture  parameter  notation): 
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(5.14) 


since  the  integral  of  a  pdf  over  the  entire  sample  space  of  the  continuous  random 
quantity  it  describes  is  one. 


5. 2. 3. 4  Hellinger  Affinity  Measure  MRA.  Figure  [SO  shows  the  flow¬ 
chart  for  the  HA  MRA.  The  algorithm  is  essentially  the  same  as  the  HD  MRA, 
including  a  similar  modification  to  the  Hellinger  Affinity  Measure  equation  when  the 
reduced-component  mixture  weights  are  not  normalized,  but  special  consideration 
is  necessary  when  choosing  the  optimal  value  of  this  measure.  Since  the  reduced- 
component  mixture  weights  are  not  guaranteed  to  be  normalized  throughout  the  op¬ 
eration  of  the  algorithm,  the  Hellinger  Affinity  Measure,  given  by  Equation  (4.6),  is 
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Figure  5.5:  A  flowchart  of  the  Hcllinger  Affinity  Measure  MRA. 
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modified  to  account  for  this  possibility. 
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(5.15) 


Either  the  truncated  binomial  series  or  the  “heuristic  approximation”  is  applied  to 
the  numerator  term  to  find  an  approximate  closed-form  solution.  However,  since 
the  quantity  on  the  right-hand  side  of  Equation  (15.15)  represents  the  cosine  of  the 
angle  between  the  square  root  of  the  original  and  approximate  pdfs  using  the  vector 
analogy  developed  in  Subsection  14.1.11  this  quantity  must  be  bounded  between  —  1 
and  +1.  If  either  approximation  produces  values  outside  of  these  bounds,  then  the 
resulting  distance  measure  is  not  valid.  In  this  case,  one  may  choose  to  declare  the 
approximation  inadequate  and  discard  it. 


5.3  Mixture  Reduction  Algorithm  Analysis 

The  new  MRAs  developed  in  the  previous  section  are  analyzed  in  this  section. 
An  analysis  of  the  impact  of  neglecting  mixture  weight  re-normalization  during  op¬ 
eration  of  the  MRAs  is  presented  in  Subsection  5.3.11  Also,  each  new  MRA  is  tested 
in  Subsection  5.3.2  using  two  randomly-generated  univariate  Gaussian  mixture  pdfs, 
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and  the  approximate  reduced-component  mixture  pdf  produced  by  each  algorithm  is 
qualitatively  compared  with  the  original,  full-component  mixture  and  the  output  of 
the  other  MRAs.  A  comparison  is  also  made  with  respect  to  required  computation 
time  for  each  algorithm  when  coded  using  the  same  techniques. 

5.3.1  Impact  of  Mixture  Weight  Re- Normalization.  Subsection  15.2.21  men- 
tions  that  mixture  weight  re-normalization  of  the  reduced-component  approximation 
Gaussian  mixture  pdf  is  neglected  until  near  the  last  step  of  each  MRA  resulting  in 
improved  computational  speed.  Using  the  Hilbert  space  vector  analogy  for  Gaussian 
mixture  pdfs  of  Section  14.1]  this  subsection  shows  that  this  improvement  does  not 
impact  MRAs  based  on  true  distance  measure  functions  of  the  cosine  of  the  angle 
between  two  vectors  representing  mixture  pdfs,  but  that,  in  general,  it  does  affect 
MRAs  based  on  true  distance  measure  functions  of  the  error  vector  between  two  vec¬ 
tors  representing  mixture  pdfs.  That  is,  true  distance  measures  of  the  cosine  of  the 
angle  between  two  vectors  representing  Gaussian  mixture  pdfs  are  invariant  to  scalar 
transformations  of  the  mixtures,  such  as  re-normalization  of  mixture  weights,  but  true 
distance  measures  of  the  error  vector  between  the  two  mixture  pdfs  are  not  invariant 
to  this  type  of  transformation.  Recall  that  the  Correlation  Measure  calculates  the 
cosine  of  the  angle  between  a  vector  representing  the  original  mixture  pdf  and  an¬ 
other  vector  representing  the  reduced-component  approximation  of  the  same,  while 
the  Hcllinger  Affinity  Measure  calculates  the  same  quantity,  but  the  vectors  represent 
the  square  root  of  each  pdf  instead.  Also,  recall  that  the  ISE  cost  function  is  the 
squared  length  of  the  error  vector  between  the  two  vectors  representing  the  original 
and  approximate  Gaussian  mixture  pdfs,  and  that  the  Hellinger  Distance  is  the  length 
of  the  error  vector  between  two  vectors  representing  the  square  root  of  each  mixture 
pdf. 

Figure {H may  be  used  to  explain  the  effect  of  mixture  weight  re- normalization 
on  the  Hcllinger  Affinity  Measure,  Correlation  Measure,  Hcllinger  Distance,  and  ISE 
cost  function.  Imagine  that  a  mixture  component  deletion  action  is  executed  and  that 
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b  pb 

(a)  (b) 


Figure  5.6:  Two  depictions  of  the  error  vectors  and  angles  between  the  vectors  a,  b 
and  /3b  which  represent  the  original  mixture  pdf,  a  reduced-component  approximation 
mixture  pdf,  and  a  scaled  (re-normalized)  version  of  the  same  reduced-component 
approximation  pdf,  respectively.  Notice  that  the  length  of  the  error  vector  a  —  b  is 
different  from  that  of  the  error  vector  a  —  (3b ,  but  that  the  angles  Q\  and  02  are  the 
same. 

the  mixture  weights  of  the  approximation  pdf  are  not  re-normalized  so  that  the  vector 
b,  as  shown  in  sub-plot  (a),  represents  the  reduced-component  approximation.  Also, 
consider  the  case  in  which  mixture  weight  re-normalization  is  performed  so  that  the 
approximation  pdf  is  represented  by  the  scaled  vector  /3b,  as  shown  in  sub-plot  (b). 
Notice  in  sub-plots  (a)  and  (b)  that  the  angles  6 i  and  02  are  equal,  but  that  the  lengths 
of  the  error  vectors  a  —  b  and  a  —  (3b  are  different.  Since  the  Correlation  Measure  is  the 
cosine  of  the  angle  between  two  vectors,  mixture  weight  re-normalization  has  no  effect 
on  this  measure  function.  Likewise,  mixture  weight  re-normalization  has  no  impact 
on  the  Hellinger  Affinity  Measure  as  long  as  the  appropriate  modification,  given  by 
Equation  (15.15).  is  made  to  this  measure  function.  In  contrast,  the  ISE  cost  function 
and  the  Hellinger  Distance  are  dependent  on  the  length  of  the  vector  representing  the 
reduced-component  approximation  mixture  pdf,  so,  in  general,  re-normalization  of 
the  approximation  pdf  mixture  weights  affects  the  output  of  these  measure  functions. 

As  an  illustration  of  the  role  re-normalization  plays  in  an  MRA,  consider  reduc¬ 
ing  a  19-component  univariate  Gaussian  mixture  pdf  to  one  containing  only  eight 
components  using  the  ISE  and  CM  MRAs  with  and  without  mixture  weight  re- 
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normalization7.  Figure  [5771  shows  the  outputs  of  the  ISE  and  CM  MRAs  when  re¬ 
normalization  of  the  reduced-component  approximation  mixture  pdf  is  not  performed 
(sub-plots  (a)  and  (b)),  and  when  re- normalization  is  performed  (sub-plots  (c)  and 
(d)).  As  predicted  by  theory,  the  CM  MRA  produces  the  same  result  regardless  of 
whether  or  not  mixture  weight  re-normalization  is  performed  during  operation  of  the 
algorithm,  since  the  output  of  the  Correlation  Measure  for  the  reduced-component 
approximation  pdf  and  the  re-normalized  (scaled)  version  of  the  approximation  pdf  is 
the  same.  However,  a  comparison  of  sub-plots  (a)  and  (c)  shows  that  the  ISE  MRA 
is  dependent  on  mixture  weight  re-normalization,  since  the  ISE  cost  function  depends 
on  the  length  of  the  reduced-component  approximation  mixture  pdf. 


Although  in  general  the  ISE  and  HD  MRAs  are  dependent  on  mixture  weight 
re-normalization  during  their  operation,  multiple  experiments  using  the  ISE  and  HD 
MRAs  have  shown  that  this  dependence  is  usually  not  strongly  evident.  As  an  exam¬ 
ple,  if  the  19-component  univariate  Gaussian  mixture  pdf  from  the  previous  paragraph 
is  reduced  to  a  four-component  mixture  pdf  instead  of  an  eight-component  one,  then 
the  reduced-component  approximation  Gaussian  mixture  pdf  is  the  same  whether  or 
not  mixture  weight  re-normalization  is  performed  (however,  the  intermediate  steps  of 
the  reduction  process  differ  between  the  two  implementations).  Considering  the  sub¬ 
stantial  computational  savings  of  not  re-normalizing  the  mixture  weights  until  near 
the  last  step  of  an  MRA  and  the  unlikelihood  of  encountering  a  reduction  task  which 
would  result  in  significantly  different  outputs  depending  on  whether  or  not  mixture 
weight  re-normalization  is  performed,  neglecting  re-normalization  until  near  the  last 
step  of  an  MRA  appears  to  be  worth  any  potential  degradation  in  approximation 
quality.  Therefore,  all  of  the  MRAs  tested  in  the  next  subsection  do  not  re-normalize 
the  mixture  weights  of  the  reduced-component  approximation  pdf  until  near  the  end 
of  the  algorithm. 


'  The  HD  MRA  and  HA  MRA  are  not  considered  since  evaluation  of  their  corresponding  measure 
functions  requires  approximation  which  could  potentially  obscure  the  effect  of  mixture  weight  re¬ 
normalization. 
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Original  Mixture 


(a)  ISE  MRA  without  Mixture 
Weight  Re -Normalization 

Original  Mixture 


Original  Mixture 


(b)  CM  MRA  without  Mixture 
Weight  Re-Normalization 

Original  Mixture 


(c)  ISE  MRA  with  Mixture  Weight 
Re -Normalization 


(d)  CM  MRA  with  Mixture  Weight 
Re -Normalization 


Figure  5.7:  An  illustration  of  the  effect  of  re-normalizing  the  mixture  weights  of 

a  reduced-component  approximation  Gaussian  mixture  pdf  using  the  ISE  and  CM 
MRAs  with  and  without  re-normalization  during  operation  of  the  algorithms.  Notice 
that  re-normalizing  the  mixture  weights  of  the  approximation  pdf  has  no  effect  on  the 
CM  MRA  output,  but  the  ISE  MRA  produces  different  results  (the  circled  portions 
of  the  pdf  in  sub-plot  (c))  depending  on  whether  mixture  weight  re-normalization  is 
used  or  not. 
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5.3.2  Mixture  Reduction  Algorithm  Test  Results.  Two  randomly-generated 
univariate  Gaussian  mixture  pdfs  are  used  to  test  the  fidelity  of  the  reduced-component 
approximation  Gaussian  mixture  pdfs  produced  by  the  new  MRAs  and  Williams’ 
original  ISE  MRA.  MRAs  are  judged  based  on  the  aesthetic  quality  of  their  respec¬ 
tive  approximations  to  the  full-component  Gaussian  mixture  pdf  and  on  the  required 
computation  time  for  each  algorithm  when  coded  using  the  same  techniques.  During 
testing,  it  was  noted  that  the  “heuristic  approximation”  for  the  Hellinger  Distance 
and  Hellinger  Affinity  Measure  produced  invalid  distance  measures.  Computed  dis¬ 
tance  measures  for  the  Hellinger  Distance  were  negative,  and  the  Hellinger  Affinity 
Measure  produced  outputs  which  were  greater  than  +1.  Since  the  resulting  distance 
measures  were  invalid,  the  HD  and  HA  MRAs  based  on  this  approximation  were 
abandoned,  and  only  the  HD  and  HA  MRAs  based  on  the  truncated  binomial  series 
approximation  were  considered. 

The  ISE,  CM,  ISE  Shotgun,  HA,  and  HD  MRAs  were  tested  against  two 
randomly-generated  univariate  Gaussian  mixture  pdfs.  Results  for  the  first  test,  which 
was  to  reduce  a  15-component  mixture  to  a  10-component  one,  are  shown  in  Figure 
15.81  Of  the  five  MRAs  tested,  the  ISE  Shotgun  MRA  produced  the  worst-looking  ap¬ 
proximation,  and  the  ISE  and  CM  MRAs  generated  reduced  mixture  approximations 
that  appear  almost  identical  to  the  original  mixture  pdf.  The  HD  and  HA  MRAs 
produced  questionable  approximations,  but  required  about  half  of  the  computation 
time  as  that  of  the  ISE  and  CM  MRAs.  Almost  the  same  reduction  in  computation 
time  was  noted  for  the  ISE  Shotgun  MRA  as  well. 

Figure  II5T9I  displays  the  results  of  the  second  MRA  test,  which  required  each 
MRA  to  reduce  a  19-component  univariate  Gaussian  mixture  pdf  to  a  5-component 
mixture.  Again,  the  ISE  and  CM  MRAs  produced  the  best-looking  approximations 
despite  reducing  the  number  of  components  in  the  original  mixture  pdf  by  almost  75%. 
However,  in  contrast  to  the  first  test,  the  ISE  Shotgun  MRA  generated  a  visibly  better 
reduced-component  approximation  than  either  of  the  Hellinger-based  MRAs  using  the 
truncated  binomial  series  approximation.  As  in  the  first  test,  the  ISE  Shotgun,  HD, 
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Reduced  Mixture 


(a)  ISE  MRA 


Original  Mixture 


Reduced  Mixture 


(c)  ISE  Shotgun  MRA 


Original  Mixture 


Reduced  Mixture 


(b)  CM  MRA 

Original  Mixture 


Reduced  Mixture 


(d)  HA  MRA 


Original  Mixture 


Reduced  Mixture 


(e)  HD  MRA 


Figure  5.8:  The  first  MRA  test  consisting  of  a  15-component  univariate  Gaussian 
mixture  pdf  which  is  approximated  by  a  10-component  mixture  pdf  using  the  cor¬ 
responding  MRA.  Mixture  components  are  represented  by  dashed  traces  while  the 
complete  mixture  pdfs  are  depicted  by  the  solid  traces. 
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(a)  ISE  MRA 


(b)  CM  MRA 


Reduced  Mixture  Reduced  Mixture 


(c)  ISE  Shotgun  MRA 


(d)  HA  MRA 


(e)  HD  MRA 


Figure  5.9:  The  second  MRA  test  consisting  of  a  19-component  univariate  Gaus¬ 
sian  mixture  pdf  which  is  approximated  by  a  5-component  mixture  pdf  using  the 
corresponding  MRA.  Mixture  components  are  represented  by  dashed  traces  while  the 
complete  mixture  pdfs  are  depicted  by  the  solid  traces. 
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and  HA  MRAs  took  almost  half  of  the  required  computation  time  as  that  for  the  ISE 
and  CM  MRAs. 

Based  on  the  qualitative  assessments  of  the  approximations  produced  by  the 
MRAs  in  the  two  tests,  the  CM  MRA  is  the  best  new  candidate  MRA  (in  addition  to 
William’s  ISE  MRA)  for  use  in  a  Bayesian  tracking  in  clutter  algorithm.  The  HA  and 
HD  MRAs  performed  rather  poorly,  which  is  likely  attributable  to  the  crude  approx¬ 
imation  used  in  Equation  (5.9).  Utilizing  higher-order  terms  in  the  binomial  series  of 
Equation  (5.8)  would  likely  improve  performance.  However,  the  added  computational 
requirements  may  outweigh  any  performance  gains,  especially  if  the  additional  compu¬ 
tations  necessary  for  the  approximation  lead  to  a  similar  computational  requirement 
as  that  for  the  ISE  and  CM  MRAs,  which  use  exact  closed-form  solutions  for  their 
respective  measure  functions.  Results  for  the  ISE  Shotgun  MRA  do  not  indicate  that 
this  MRA  is  a  good  candidate  for  implementation  into  a  Bayesian  tracking  in  clutter 
algorithm:  Greedy  Algorithm  B  is  significantly  superior  to  Greedy  Algorithm  A  for 
MRA  performance. 


5-4  Summary 

Four  new  mixture  reduction  algorithms  (MRAs)  were  developed,  implemented, 
and  tested  in  this  chapter.  MRAs  based  on  the  Hellinger  Distance  and  Hellinger  Affin¬ 
ity  Measure  used  either  the  truncated  binomial  series  approximation  or  the  “heuristic 
approximation”  suggested  in  [15].  However,  the  second  approximation  produced  in¬ 
valid  distance  measures,  so  it  was  abandoned  in  favor  of  the  truncated  binomial  series 
approximation.  Williams’  original  Integral  Square  Error  (ISE)  cost-function-based 
MRA  was  modified  by  replacing  Greedy  Algorithm  B  with  Greedy  Algorithm  A  to 
create  the  ISE  Shotgun  MRA,  and  it  was  also  modified  by  swapping  the  ISE  cost 
function  with  the  Correlation  Measure  (CM)  to  produce  the  CM  MRA.  Of  the  four 
new  MRAs  proposed  in  this  chapter,  only  the  CM  MRA  appears  suitable  for  use  in 
a  Bayesian  tracking  in  clutter  algorithm:  only  it  yields  performance  comparable  to 
that  of  the  ISE  MRA. 
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VI.  Simulation  Description  &;  Results 

In  the  previous  chapter,  the  Correlation  Measure  mixture  reduction  algorithm  (CM 
MRA)  was  selected  as  the  best  candidate  as  an  alternative  to  the  Integral  Square 
Error  (ISE)  MRA  for  the  mixture  reduction  portion  of  a  practical  Bayesian  tracking 
in  the  presence  of  measurement  origin  uncertainty  algorithm.  The  single-target  in 
heavy  clutter  simulation  scenario  presented  in  [38]  is  used  to  test  both  the  CM  MRA 
and  Williams’  ISE  cost-function-based  MRA.  Simulation  results  for  each  MRA  are 
compared,  and  a  final  evaluation  of  the  relative  performance  of  the  CM  MRA  is 
presented. 


6. 1  Description 


The  single-target  in  heavy  clutter  scenario  found  in  [38]  is  used  to  test  both  the 
CM  MRA  and  the  ISE  MRA.  The  target  travels  in  the  x-y  plane  according  to  the 
constant  velocity  (CV)  model  of  Section  12.11 
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where  E{nx{k)}  =  E{ny(k)}  =  0,  E{nx[k)nx{l)}  =  5ki,  E{ny(k)ny(l)}  =  8ki,  and 
E{nx(k)ny(l)}  =  0.  Initial  conditions  for  the  Gaussian  target  state  random  process 
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At  each  scan  k,  the  expected  number  of  false-origin  measurements,  A ftVs, 
is  480.  The  false-origin  measurement  clutter  density,  A  ft,  is  set  to  0.012  so  that 
Vs  =  200  x  200,  which  is  the  surveillance  region  of  the  sensor  at  scan  k.  This  re¬ 
gion  is  a  box  in  the  x-y  plane  centered  at  the  true  target  location  at  sample  k.  The 
target-oriented  data  association  approach  and  measurement  gating  are  utilized,  and 
the  gate  probability,  Pg,  is  set  to  one  (Pg  is  the  probability  that  the  true  target  mea¬ 
surement  falls  within  the  corresponding  measurement  gate).  Measurement  gating  is 
accomplished  using  Williams’  square  gating  routine  in  which  the  gate  is  formed  as  the 
square  centered  about  a  predicted  measurement  with  side  length  of  twice  the  square 
root  of  the  maximum  eigenvalue  of  the  covariance  of  the  residual  after  scaling  by  the 
gate  threshold  pHJSDiSI].  Finally,  the  probability  of  detection,  Pd,  is  also  set  to  one. 

Two-hundred  Monte  Carlo  simulations  were  run  using  Williams’  MATLAB® 
code  |381I401I4I]  with  the  maximum  number  of  mixture  components  set  to  1,  5,  10,  15, 
20,  25,  30,  and  35  components  using  both  the  ISE  and  CM  MRAs.  The  CM  MRA 
was  implemented  by  replacing  the  ISE  cost  function  with  the  Correlation  Measure  in 
Williams’  ISE  MRA  MEX  C-code  and  modifying  the  decisions  criterion  accordingly. 
Each  simulation  determined  the  total  number  of  scans  that  the  target  was  tracked 
before  track  loss  occurred  (i.e.,  the  track  life).  Track  loss  occurs  if  either  of  the 
following  criteria  are  met: 

(i)  the  true  target  measurement  is  not  within  the  measurement  gates  of  any  of  the 
hypothesized  tracks  for  five  consecutive  scans,  or 
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(ii)  the  discrepancy  between  the  combined  target  state  mean  estimate  for  every  hy¬ 


pothesized  track  and  the  combined  true  target  state  is  greater  than  ten  standard 
deviations1]  for  five  consecutive  scans. 


Pseudo-random  number  generators  were  set  to  predetermined  values  at  the  beginning 
of  each  simulation  so  that  both  MRAs  were  presented  with  exactly  the  same  mea¬ 
surement  data  before  track  loss  occurred.  The  track  life  results  of  these  simulations 
are  presented  in  the  next  section. 


6.2  Results 

The  results  from  the  simulation  scenario  described  in  the  last  section  are  covered 
in  this  section.  Figure  6.1  depicts  the  average  track  life  for  the  CM  and  ISE  MRAs 
as  a  function  of  the  maximum  number  of  mixture  components  over  each  set  of  two- 
lmndred  Monte  Carlo  trials.  As  expected,  the  average  track  life  improves  as  the 
number  of  mixture  components  increases,  since  including  more  components  in  the 
target  state  pdf  approximation  produces  a  better  representation  of  the  original  target 
state  Gaussian  mixture  pdf.  The  CM  MRA  appears  to  outperform  the  ISE  MRA 
slightly  in  some  cases  while  the  opposite  is  true  in  other  cases,  however,  overall  the 
average  track  life  differences  between  the  two  MRAs  are  statistically  insignificant. 

Figure  16.21  shows  the  percentage  and  number  of  individual  trials  in  which  the 
track  life  of  the  CM  MRA  was  exactly  the  same,  better  than,  and  worse  than  the 
track  life  of  the  ISE  MRA.  For  instance,  the  bar  on  the  far  left  of  the  figure  (for 
the  case  in  which  the  maximum  number  of  mixture  components  is  set  to  one)  shows 
that  a  large  percentage  (99%)  of  the  trials  resulted  in  exactly  the  same  track  life  for 
both  algorithms  and  that  in  two  of  those  trials  the  MRAs  produced  different  track 
life  results.  In  one  of  those  trials  in  which  the  results  of  the  MRAs  differed,  the  CM 
MRA  produced  a  longer  track  life  than  the  ISE  MRA,  and  the  ISE  MRA  generated 
a  better  track  life  in  the  other  trial.  Although  it  may  seem  somewhat  surprising  that 

*As  calculated  by  a  Kalman  Filter  applied  to  the  same  simulation  scenario  in  the  absence  of 
clutter  [311138]. 
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Mixture  Reduction  Algorithm  Average  Track  Life  Comparison 
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Figure  6.1:  Average  track  life  results  for  the  CM  and  ISE  MRAs.  The  figures  within 
the  coarsely-dashed  boxes  correspond  to  CM  MRA  results,  while  the  numbers  inside 
of  the  finely-dashed  boxes  are  for  the  ISE  MRA. 


Individual  Monte  Carlo  Run  Track  Life  Comparison  -  200  Runs 
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Figure  6.2:  The  percentage  and  number  of  trials  in  which  the  track  life  of  the  CM 
MRA  was  exactly  the  same,  better  than,  and  worse  than  that  of  the  ISE  MRA. 
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the  majority  of  the  trials  resulted  in  exactly  the  same  track  life  figures  for  the  CM  and 
ISE  MRAs,  recall  that  these  MRAs  produced  the  same  reduced-component  univariate 
Gaussian  mixture  pdfs,  as  shown  in  Figures  [fbBI  and  [5791  for  the  two  tests  conducted 
in  Subsection  5.3.21  so  this  outcome  should  not  be  unexpected.  Notice  that  in  cases  in 
which  the  track  life  results  of  the  two  MRAs  differ,  the  CM  MRA  outperforms  the  ISE 
MRA  for  most  settings  of  the  maximum  number  of  mixture  components.  However, 
this  graphic  does  not  indicate  the  number  of  scans  by  which  the  CM  MRA  track  life 
is  longer  for  these  cases.  This  information  is  contained  in  the  next  figure. 


For  cases  in  which  the  track  life  for  the  CM  and  ISE  MRAs  differ,  Figure 


6.3  shows  the  average,  maximum,  and  minimum  track  life  disparities  between  the 
two  MRAs2]  The  upper  plot,  (a),  shows  that,  for  four  of  the  seven  settings  for  the 
maximum  allowable  number  of  mixture  components,  in  cases  in  which  the  CM  MRA 
outperformed  the  ISE  MRA,  the  average  disparity  between  the  track  life  of  the  two 
MRAs  is  larger  than  that  for  cases  in  which  the  ISE  MRA  outperformed  the  CM  MRA. 
In  five  of  the  seven  settings  for  the  maximum  number  of  mixture  components,  the 
maximum  difference  between  the  two  MRAs  is  greatest  for  the  CM  MRA  in  trials  in 
which  the  CM  MRA  outperformed  the  ISE  MRA.  Sub-plot  (b)  is  an  enlarged  version 
of  sub-plot  (a).  This  graphic  shows  values  in  which  the  track  life  disparity  between 
the  CM  and  ISE  MRAs  is  less  than  ten  scans.  Notice  that  track  life  difference  metrics 
are  not  shown  for  the  ISE  MRA  when  the  maximum  allowable  number  of  mixture 
components  is  set  to  30  since,  as  shown  in  Figure I6T21  the  ISE  MRA  did  not  outperform 
the  CM  MRA  in  any  of  the  trials. 


2  The  single  mixture  component  simulations  were  not  included  in  this  figure  to  minimize  clutter. 
In  the  two  trials  in  which  the  CM  and  ISE  MRAs  differed  in  track  life,  the  CM  MRA  outperformed 
the  ISE  MRA  by  22  scans  in  the  first  instance,  while  the  ISE  MRA  outperformed  the  CM  MRA  by 
21  scans  in  the  other  trial. 
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Track  Life  Differences  when  CM  or  ISE  Better 
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Figure  6.3:  The  average,  maximum,  and  minimum  track  life  differences  between 

the  CM  and  ISE  MRAs  in  the  trials  in  which  the  MRAs  produced  different  track  life 
results  (sub-plot  (b)  is  an  enlarged  version  of  sub-plot  (a)). 


164 


6. 3  Summary 


This  chapter  presented  the  single-target  in  heavy  clutter  simulation  scenario 
used  to  test  the  track  life  performance  of  the  Correlation  Measure  mixture  reduction 
algorithm  (CM  MRA)  when  implemented  as  the  mixture  reduction  mechanism  of  a 
Bayesian  tracking  in  clutter  algorithm,  in  comparison  to  Williams’  Integral  Square 
Error  (ISE)  MRA.  Various  metrics  of  the  track  life  results  were  obtained  for  the  CM 
MRA  and  compared  to  those  for  the  ISE  MRA  using  the  same  simulation  scenario. 
The  average  track  life  differences  between  the  two  MRAs  are  statistically  insignifi¬ 
cant.  Since  the  CM  and  ISE  MRAs  only  differ  in  the  measure  function  used  in  each 
algorithm  and  each  measure  function  requires  the  evaluation  of  all  the  same  compo¬ 
nent  terms,  computation  time  for  both  MRAs  is  almost  exactly  the  same  under  the 
condition  that  both  MRAs  produce  the  same  track  lifq3].  Trials  in  which  the  track  life 
figures  for  the  two  MRAs  differed  were  further  analyzed,  but  a  conclusive  declaration 
about  the  superiority  of  one  MRA  over  the  other  could  not  be  made  based  on  the  test 
data. 


3In  general,  the  scalar  multiplication  and  division  operations  used  to  compute  the  Correlation 
Measure  are  more  computationally  costly  than  the  scalar  addition  and  subtraction  operations  used 
in  the  ISE  cost  function.  However,  it  was  noted  in  various  simulations  that  the  run  time  for  the  ISE 
MRA  and  CM  MRA  were  virtually  identical. 
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VII.  Conclusions  &;  Recommendations 


This  chapter  concludes  the  thesis  by  summarizing  its  important  findings.  A 
restatement  of  the  research  goal  is  presented  to  reacquaint  the  reader  with  the 
objective  of  the  thesis,  followed  by  a  summary  of  key  results.  Significant  contributions 
contained  in  this  thesis  are  highlighted,  and  recommendations  for  future  research  are 
included  in  the  final  section. 


7.1  Restatement  of  the  Research  Goal 

Equation  (12.651)  (modified  for  a  single  target  state  random  process  vector), 


f (x(k),e(k)\zk)  =  Y.  •••  E  p({e«hlzl)/h«l{0Jtz‘) 

ik&NH{k)  heNH(l) 


of  Sect  ion  dJj]  is  the  Bayesian  solution  for  tracking  a  target  in  clutter.  As  new  measure¬ 
ments  are  received  at  subsequent  scans,  new  summations  are  added,  and  the  number 
of  terms  needed  to  evaluate  the  target  state  multivariate  Gaussian  mixture  pdf  be¬ 
comes  computationally  unrealistic.  Thus,  some  type  of  approximation  is  necessary  to 
implement  the  rigorous  Bayesian  solution  for  the  target  state  pdf. 


A  mixture  reduction  algorithm  (MRA)  is  one  method  of  approximation.  When 
tracking  a  single  target  in  heavy  clutter  while  retaining  a  large  number  of  mixture 
components,  Williams’  Integral  Square  Error  (ISE)  MRA  has  been  shown  to  provide 
longer  average  track  life  results  than  any  other  algorithm  [381  SOI  SI]-  However,  rec¬ 


ommendations  in  [38]  indicate  the  potential  for  improving  upon  the  results  produced 
by  Williams’  algorithm.  Thus,  the  goal  of  this  research  is  to  create  a  new  MRA  which 
offers  better  tracking  performance  and/or  decreased  computation  time  as  compared  to 
Williams’  ISE  MRA. 


7.2  Summary  of  Results 

Four  new  MRAs  were  developed  in  an  attempt  to  meet  the  research  goal.  The 
motivation  for  three  of  the  four  new  MRAs,  the  Integral  Square  Error  (ISE)  Shot- 
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gun,  Hellinger  Distance  (HD),  and  Hellinger  Affinity  Measure  (HA)  MRAs,  was  the 
prospect  of  decreased  computation  time  while  preserving  comparable  tracking  per¬ 
formance  to  the  ISE  MRA.  However,  as  any  engineer  knows,  improved  performance 
in  one  aspect  of  a  design  often  leads  to  decreased  performance  in  another  area,  and 
the  ISE  Shotgun,  HD,  and  HA  MRAs  were  no  exception.  Although  all  three  MRAs 
required  roughly  half  of  the  computation  time  of  Williams’  original  ISE  MRAO,  the 
quality  of  the  reduced-component  univariate  Gaussian  mixture  pdf  approximations 
produced  by  these  MRAs  was  rather  poor,  as  shown  in  Figures  15.81  and  15.91 


The  fourth  new  MRA,  the  Correlation  Measure  (CM)  MRA,  provided  much 
better  results  than  the  other  new  MRAs,  but  it  only  offered  a  slight  improvement 
over  Williams’  ISE  MRA.  Figures  15.8(b)  and  15.9(b)  clearly  show  that  the  reduced- 
component  mixture  pdf  approximations  generated  by  the  CM  MRA  closely  match 
the  original  univariate  Gaussian  mixture  pdf.  In  fact,  the  CM  MRA  made  the  same 
approximations  as  the  ISE  MRA.  Although  this  fact  is  not  true  in  general  (as  indicated 
by  the  track  life  performance  of  the  two  MRAs),  this  phenomenon  likely  occurred  in 
over  ninety  percent  of  the  simulation  trials,  which  resulted  in  exactly  the  same  track 
life  for  both  MRAs,  as  depicted  in  Figure  16.21 


Geometrically,  the  similarity  inferred  from  the  descriptions  of  each  measure 
function  given  in  Subsection  14,1.11  between  the  Correlation  Measure  and  the  ISE  cost 
function  may  be  used  to  explain  this  phenomenon.  Using  a  vector  analogy,  the  origi¬ 
nal  Gaussian  mixture  pdf  and  reduced-component  approximation  may  be  thought  of 
as  two  vectors  a  and  b,  respectively,  in  Hilbert  space ,  as  shown  in  Figure  17711  The 
Correlation  Measure  is  the  cosine  of  the  angle ,  9,  between  the  two  mixture  pdfs,  and 
the  ISE  cost  function  is  the  squared  length  of  the  error  vector,  (a  —  b),  using  the 
“standard”  definition  of  the  Euclidean  norm.  Based  on  the  Correlation  Measure,  two 
mixtures  are  perfectly  matched  if  the  angle  between  them  is  zero.  However,  two  mix- 


1This  statement  only  pertains  to  MRAs  implemented  for  the  purpose  of  this  thesis.  These  al¬ 
gorithms  were  coded  using  the  same  blocks  of  code  whenever  possible  to  minimize  the  impact  of 
specific  implementations  on  algorithm  run  times. 
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tures  with  a  zero-angle  between  them  may  not  be  perfectly  matched  in  the  sense  of 
(a  —  b)  being  zero,  as  shown  in  Figure  17.1(b).  In  this  case,  the  ISE  cost  function 
provides  a  better  measure  of  the  similarity  between  the  two  mixtures  since  it  would 
indicate  that  the  two  mixtures  in  Figure  [Tilth)  are  not  perfectly  matched  (the  error 
vector  is  non- zero).  This  scenario  explains  a  situation  in  which  the  Correlation  Mea¬ 
sure  and  ISE  cost  function  may  produce  different  results,  which  would  explain  the 
disparity  in  track  life  results  experienced  in  certain  simulation  outcomes.  In  contrast, 
Figure  [7.1(c)  shows  the  only  scenario  in  which  the  ISE  cost  function  and  Correla¬ 
tion  Measure  would  “agree”  on  the  similarity  between  the  two  mixtures,  since  both 
the  angle  and  error  vector  are  zero.  This  case  may  explain  why  over  ninety  percent 
of  the  simulation  trials  produced  exactly  the  same  track  life.  However,  this  conclu¬ 
sion  implies  that  the  vectors  representing  the  full-component  and  reduced-component 
mixture  pdfs  are  essentially  co-linear  and  essentially  of  the  same  length  over  ninety 
percent  of  the  time,  which  seems  unlikely.  So  this  explanation  of  the  reason  why 
ninety-plus  percent  of  the  simulation  trials  produced  exactly  the  same  track  life  is  not 
completely  satisfying. 

Despite  the  “zero  angle,  non-zero  error  vector”  deficiency  of  the  Correlation 
Measure  pointed  out  in  the  previous  paragraph,  the  CM  MRA  slightly  outperformed 
the  ISE  MRA  in  some  aspects  for  the  tracking  scenario  considered  in  the  simulations. 
The  CM  MRA  produced  average  track  life  figures  for  six  of  the  eight  settings  of  the 
maximum  number  of  reduced  mixture  components  that  were  as  good  as,  or  slightly 
better  than,  those  for  the  ISE  MRA,  as  shown  in  Figure  I6.ll  In  cases  in  which  the 
track  life  figures  of  the  two  MRAs  differed,  the  CM  MRA  had  more  trials  with  longer 
track  life  values  than  the  ISE  MRA  for  seven  of  the  eight  settings  of  the  maximum 
number  of  reduced  mixture  components  (see  Figure  [672]) .  Additionally,  other  metrics, 
depicted  in  Figure  16.31  indicate  that  the  CM  MRA  outperformed  the  ISE  MRA  by 
a  small  margin.  However,  for  all  cases  in  which  the  CM  MRA  slightly  outperformed 
the  ISE  MRA,  the  small  differences  in  performance  are  statistically  insignificant.  In 
addition,  computational  requirements  for  both  MRAs  are  effectively  the  same  since 
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Figure  7.1:  (a)  Two  vectors  with  a  non-zero  angle  and  a  non- zero  error  vector, 

(b)  Two  vectors  with  a  zero  angle  and  a  non-zero  error  vector,  (c)  Two  vectors  with 
a  zero  angle  and  a  zero  error  vector. 

the  measure  functions  for  both  MR.As  share  the  same  components.  Therefore,  based 
on  the  simulation  results,  a  vote  in  favor  of  one  MRA  over  the  other  cannot  objectively 
be  made  on  the  basis  of  resulting  tracker  performance. 

7.3  Significant  Contributions  of  Research 

Although  the  CM  MRA  did  not  decisively  outperform  the  ISE  MRA,  the  CM 
MRA  provides  a  viable  and  readily- implemented  alternative  for  Bayesian  tracking  al¬ 
gorithms  incorporating  Williams’  MRA.  The  CM  MRA  was  shown  to  provide  slightly 
better  performance  in  certain  aspects  than  the  ISE  MRA  for  the  given  simulation  sce¬ 
nario  described  in  Section  16.11  and  it  may  also  provide  slightly  better  performance  in 
other  scenarios  (although  there  is  no  guarantee  to  this  claim).  Additionally,  the  CM 
MRA  may  be  fully  implemented  without  requiring  mixture  weight  re-normalization 
during  operation  of  the  algorithm,  as  noted  in  Subsection  5.3.11  thus  drastically  re¬ 
ducing  the  run  time  of  the  MRA.  Since  all  of  the  components  of  the  ISE  cost  function 
(the  two  self-likeness  terms  and  one  cross-likeness  term),  given  by  Equation  (4.2). 
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are  also  used  in  the  Correlation  Measure,  given  in  Equation  (4.5).  one  may  read¬ 
ily  replace  one  measure  function  with  the  other.  This  interchangeability  of  the  two 
measure  functions  provides  a  designer  with  an  added  degree  of  freedom. 

l.f  Recommendations  for  Future  Research 

Several  future  research  topics  may  be  spawned  from  the  research  presented  in 
this  thesis.  First,  a  hybrid  Greedy  Algorithm  A/Greedy  Algorithm  B  assignment 
algorithm,  mated  with  either  the  ISE  cost  function  or  Correlation  Measure,  could  be 
developed  which  operates  according  to  a  measure  threshold.  This  new  MRA  would 
combine  the  iterative  operation  of  Greedy  Algorithm  B  with  the  computational  effi¬ 
ciency  of  Greedy  Algorithm  A  to  execute  all  reduction  actions  iteratively  with  com¬ 
puted  measures  which  do  not  exceed  some  threshold.  Essentially,  it  would  operate 
in  a  similar  manner  as  the  ISE  Shotgun  MRA,  but  instead  of  executing  all  reduction 
actions  in  one  step,  the  algorithm  would  only  execute  those  reduction  actions  with 
measures  that  met  the  threshold  criteria  (similarly  to  Salmond’s  Joining  algorithm). 
For  example,  if  the  ISE  cost  function  is  used,  then  the  algorithm  would  iteratively 
execute  reduction  actions  which  have  costs  below  some  pre-specihed  threshold.  There¬ 
after,  one  could  use  the  Greedy  Algorithm  B  to  drive  to  the  final  desired  number  of 
components  in  the  reduced  mixture.  This  new  MRA  should  decrease  computation 
time  while,  possibly,  not  suffering  as  much  from  the  poor  mixture  approximation 
performance  of  the  ISE  Shotgun  MRA. 

A  second  potential  topic  for  future  research  is  using  the  EM  algorithm  of  Chapter 
mto  generate  a  reduced-component  Gaussian  mixture  pdf  from  a  full-component  one. 
Equation  (3.24)  could  be  used  as  a  sample-based  approach  to  generating  the  reduced- 
component  target  state  pdf  approximation  to  the  original  target  state  pdf.  This  topic 
would  require  a  radical  departure  from  the  MRAs  described  in  this  thesis,  and,  as  a 
result,  is  a  riskier  potential  research  topic  than  the  first  one. 

Finally,  new  approximations  could  possibly  be  developed  for  those  true  and 
pseudo-distance  measure  functions  listed  in  Section  [4Jlwhich  do  not  have  exact  closed- 
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form  solutions.  New  approximations  for  the  Hellinger  Distance  and  Hellinger  Affinity 
Measure  could  be  made  by  using  a  different  series  approximation  than  the  truncated 
binomial  series  developed  in  Subsection  15.2.1.21  Also,  as  suggested  in  correspondence 


with  Williams  [39],  a  closed-form  approximation  to  the  Kullback-Leibler  Mean  Infor¬ 
mation  and  Kullback-Leibler  Divergence  could  be  developed.  These  pseudo-distance 


measure  functions  are  popular  in  the  literature,  and  an  adequate  approximation  may 


even  already  exist. 
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